Compare commits
4 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 52914d50b0 | |||
| bb4bee51ba | |||
| 4ac8a6d60b | |||
| d8f45ca520 |
502
ARCHITECTURE_V2.md
Normal file
502
ARCHITECTURE_V2.md
Normal file
@@ -0,0 +1,502 @@
|
|||||||
|
# MCP v2 -- Multi-Node Control Plane
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
MCP v2 introduces multi-node orchestration with a master/agent topology.
|
||||||
|
The CLI no longer dials agents directly. A dedicated **mcp-master** daemon
|
||||||
|
coordinates deployments across nodes, handles cross-node concerns (edge
|
||||||
|
routing, certificate provisioning, DNS), and serves as the single control
|
||||||
|
point for the platform.
|
||||||
|
|
||||||
|
### Motivation
|
||||||
|
|
||||||
|
v1 deployed successfully on a single node (rift) but exposed operational
|
||||||
|
pain points as services needed public-facing routes through svc:
|
||||||
|
|
||||||
|
- **Manual edge routing**: Exposing mcq.metacircular.net required hand-editing
|
||||||
|
mc-proxy's TOML config on svc, provisioning a TLS cert manually, updating
|
||||||
|
the SQLite database when the config and database diverged, and debugging
|
||||||
|
silent failures. Every redeployment risked breaking the public route.
|
||||||
|
|
||||||
|
- **Dynamic port instability**: The route system assigns ephemeral host ports
|
||||||
|
that change on every deploy. svc's mc-proxy pointed at a specific port
|
||||||
|
(e.g., `100.95.252.120:48080`), which went stale after redeployment.
|
||||||
|
Container ports are also localhost-only under rootless podman, requiring
|
||||||
|
explicit Tailscale IP bindings for external access.
|
||||||
|
|
||||||
|
- **$PORT env override conflict**: The mcdsl config loader overrides
|
||||||
|
`listen_addr` from `$PORT` when routes are present. This meant containers
|
||||||
|
ignored their configured port and listened on the route-allocated one
|
||||||
|
instead, breaking explicit port mappings that expected the config port.
|
||||||
|
|
||||||
|
- **Cert chain issues**: mc-proxy requires full certificate chains (leaf +
|
||||||
|
intermediates). Certs provisioned outside the standard metacrypt flow
|
||||||
|
were leaf-only and caused silent TLS handshake failures (`client_bytes=7
|
||||||
|
backend_bytes=0` with no error logged).
|
||||||
|
|
||||||
|
- **mc-proxy database divergence**: mc-proxy persists routes in SQLite.
|
||||||
|
Routes added via the admin API override the TOML config. Editing the TOML
|
||||||
|
alone had no effect until the database was manually updated -- a failure
|
||||||
|
mode that took hours to diagnose.
|
||||||
|
|
||||||
|
- **No cross-node coordination**: The v1 CLI talks directly to individual
|
||||||
|
agents. There is no mechanism for one agent to tell another "set up a
|
||||||
|
route for this service." Every cross-node operation was manual.
|
||||||
|
|
||||||
|
v2 addresses all of these by making the master the single coordination
|
||||||
|
point for deployments, with agents handling local concerns (containers,
|
||||||
|
mc-proxy routes, cert provisioning) on instruction from the master.
|
||||||
|
|
||||||
|
### What Changes from v1
|
||||||
|
|
||||||
|
| Concern | v1 | v2 |
|
||||||
|
|---------|----|----|
|
||||||
|
| CLI target | CLI dials agents directly | CLI dials the master |
|
||||||
|
| Node awareness | CLI routes by `node` field in service defs | Master owns the node registry |
|
||||||
|
| Service definitions | Live on operator workstation | Pushed to master, which distributes to agents |
|
||||||
|
| Edge routing | Manual mc-proxy config on svc | Master coordinates agent-to-agent setup |
|
||||||
|
| Cert provisioning | Agent provisions for local mc-proxy only | Any agent can provision certs (edge included) |
|
||||||
|
| DNS registration | Agent registers records on deploy | Master coordinates DNS across zones |
|
||||||
|
|
||||||
|
### What Stays the Same
|
||||||
|
|
||||||
|
The agent's core responsibilities are unchanged: it manages containers via
|
||||||
|
podman, stores its local registry in SQLite, monitors for drift, and alerts
|
||||||
|
the operator. The agent gains new RPCs for edge routing but does not become
|
||||||
|
aware of other nodes -- the master handles all cross-node coordination.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Topology
|
||||||
|
|
||||||
|
```
|
||||||
|
Operator workstation (vade)
|
||||||
|
┌──────────────────────────┐
|
||||||
|
│ mcp (CLI) │
|
||||||
|
│ │
|
||||||
|
│ gRPC ───────────────────┼─── overlay ───┐
|
||||||
|
└──────────────────────────┘ │
|
||||||
|
▼
|
||||||
|
Master node (straylight)
|
||||||
|
┌──────────────────────────────────────────────────────┐
|
||||||
|
│ mcp-master │
|
||||||
|
│ ├── node registry (all nodes, roles, addresses) │
|
||||||
|
│ ├── service definitions (pushed from CLI) │
|
||||||
|
│ └── deployment coordinator │
|
||||||
|
│ │
|
||||||
|
│ mcp-agent │
|
||||||
|
│ ├── mcns container │
|
||||||
|
│ ├── metacrypt container │
|
||||||
|
│ ├── mcr container │
|
||||||
|
│ └── mc-proxy (straylight) │
|
||||||
|
└──────────┬──────────────────────────┬────────────────┘
|
||||||
|
│ │
|
||||||
|
overlay overlay
|
||||||
|
│ │
|
||||||
|
▼ ▼
|
||||||
|
Worker node (rift) Edge node (svc)
|
||||||
|
┌─────────────────────┐ ┌─────────────────────────┐
|
||||||
|
│ mcp-agent │ │ mcp-agent │
|
||||||
|
│ ├── exo │ │ ├── mc-proxy (svc) │
|
||||||
|
│ ├── mcq │ │ └── (edge routes only) │
|
||||||
|
│ ├── mcdoc │ │ │
|
||||||
|
│ ├── sgard │ │ Edge routes: │
|
||||||
|
│ ├── kls │ │ mcq.metacircular.net │
|
||||||
|
│ └── mc-proxy │ │ mcdoc.metacircular.net │
|
||||||
|
│ (rift) │ │ exo.metacircular.net │
|
||||||
|
└─────────────────────┘ │ sgard.metacircular.net │
|
||||||
|
└─────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### Node Roles
|
||||||
|
|
||||||
|
| Role | Purpose | Nodes |
|
||||||
|
|------|---------|-------|
|
||||||
|
| **master** | Runs mcp-master + mcp-agent. Hosts core infrastructure (mcns, metacrypt, mcr). Single coordination point. | straylight |
|
||||||
|
| **worker** | Runs mcp-agent. Hosts application services. | rift |
|
||||||
|
| **edge** | Runs mcp-agent. Terminates public TLS, forwards to internal services. No application containers. | svc |
|
||||||
|
|
||||||
|
Every node runs an mcp-agent. The master node also runs mcp-master.
|
||||||
|
The master's local agent manages the infrastructure services (mcns,
|
||||||
|
metacrypt, mcr) the same way rift's agent manages application services.
|
||||||
|
|
||||||
|
### mc-proxy Mesh
|
||||||
|
|
||||||
|
Each node runs its own mc-proxy instance. They form a routing mesh:
|
||||||
|
|
||||||
|
```
|
||||||
|
mc-proxy (straylight)
|
||||||
|
├── :443 L7 routes for metacrypt-web, mcr-web
|
||||||
|
├── :8443 L4 passthrough for metacrypt-api, mcr-api
|
||||||
|
└── :9443 L4 passthrough for gRPC services
|
||||||
|
|
||||||
|
mc-proxy (rift)
|
||||||
|
├── :443 L7 routes for internal .svc.mcp hostnames
|
||||||
|
└── :8443 L4/L7 routes for internal APIs
|
||||||
|
|
||||||
|
mc-proxy (svc)
|
||||||
|
└── :443 L7 termination for public hostnames
|
||||||
|
→ forwards to internal .svc.mcp endpoints
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## mcp-master
|
||||||
|
|
||||||
|
The master is a new binary that coordinates cross-node operations. It is
|
||||||
|
**not** a replacement for the agent -- it sits above agents and orchestrates
|
||||||
|
them.
|
||||||
|
|
||||||
|
### Responsibilities
|
||||||
|
|
||||||
|
1. **Accept CLI commands** via gRPC (deploy, undeploy, status, sync).
|
||||||
|
2. **Route deployments** to the correct agent based on the service
|
||||||
|
definition's `node` field.
|
||||||
|
3. **Detect public hostnames** in service definitions and coordinate edge
|
||||||
|
routing with the edge node's agent.
|
||||||
|
4. **Validate public hostnames** against a configured allowlist of domains
|
||||||
|
(e.g., `metacircular.net`, `wntrmute.net`).
|
||||||
|
5. **Resolve edge nodes** by checking DNS CNAME records to determine which
|
||||||
|
node handles public traffic for a given hostname.
|
||||||
|
6. **Coordinate undeploy** across nodes: tear down the service on the
|
||||||
|
worker, then clean up edge routes on the edge node.
|
||||||
|
|
||||||
|
### What the Master Does NOT Do
|
||||||
|
|
||||||
|
- Store container state (agents own their registries).
|
||||||
|
- Manage container lifecycle directly (agents do this).
|
||||||
|
- Run containers (the co-located agent does).
|
||||||
|
- Replace the agent on any node.
|
||||||
|
|
||||||
|
### Master Configuration
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[server]
|
||||||
|
grpc_addr = "100.x.x.x:9555" # master listens on overlay
|
||||||
|
tls_cert = "/srv/mcp-master/certs/cert.pem"
|
||||||
|
tls_key = "/srv/mcp-master/certs/key.pem"
|
||||||
|
|
||||||
|
[mcias]
|
||||||
|
server_url = "https://mcias.metacircular.net:8443"
|
||||||
|
service_name = "mcp-master"
|
||||||
|
|
||||||
|
# Allowed public domains. Hostnames in service definitions must fall
|
||||||
|
# under one of these suffixes.
|
||||||
|
[edge]
|
||||||
|
allowed_domains = ["metacircular.net", "wntrmute.net"]
|
||||||
|
|
||||||
|
# Node registry. The master knows about all nodes.
|
||||||
|
[[nodes]]
|
||||||
|
name = "straylight"
|
||||||
|
address = "100.x.x.x:9444"
|
||||||
|
role = "master"
|
||||||
|
|
||||||
|
[[nodes]]
|
||||||
|
name = "rift"
|
||||||
|
address = "100.95.252.120:9444"
|
||||||
|
role = "worker"
|
||||||
|
|
||||||
|
[[nodes]]
|
||||||
|
name = "svc"
|
||||||
|
address = "100.x.x.x:9444"
|
||||||
|
role = "edge"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Edge Routing
|
||||||
|
|
||||||
|
The core v2 feature: when a service declares a public hostname, the
|
||||||
|
master automatically provisions the edge route.
|
||||||
|
|
||||||
|
### Service Definition
|
||||||
|
|
||||||
|
Public hostnames are declared in the route's `hostname` field. The
|
||||||
|
master distinguishes public from internal hostnames by checking whether
|
||||||
|
they fall under a `.svc.mcp.` subdomain:
|
||||||
|
|
||||||
|
- `mcq.svc.mcp.metacircular.net` → internal (handled by local mc-proxy)
|
||||||
|
- `mcq.metacircular.net` → public (requires edge routing)
|
||||||
|
|
||||||
|
```toml
|
||||||
|
name = "mcq"
|
||||||
|
node = "rift"
|
||||||
|
active = true
|
||||||
|
|
||||||
|
[[components]]
|
||||||
|
name = "mcq"
|
||||||
|
image = "mcr.svc.mcp.metacircular.net:8443/mcq:v0.4.0"
|
||||||
|
volumes = ["/srv/mcq:/srv/mcq"]
|
||||||
|
cmd = ["server", "--config", "/srv/mcq/mcq.toml"]
|
||||||
|
|
||||||
|
# Internal route: handled by rift's mc-proxy.
|
||||||
|
[[components.routes]]
|
||||||
|
name = "internal"
|
||||||
|
port = 8443
|
||||||
|
mode = "l7"
|
||||||
|
|
||||||
|
# Public route: master detects this and sets up edge routing on svc.
|
||||||
|
[[components.routes]]
|
||||||
|
name = "public"
|
||||||
|
port = 8443
|
||||||
|
mode = "l7"
|
||||||
|
hostname = "mcq.metacircular.net"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Deploy Flow with Edge Routing
|
||||||
|
|
||||||
|
When the master receives `Deploy(mcq)`:
|
||||||
|
|
||||||
|
1. **Route to worker**: Master sends `Deploy` RPC to rift's agent with
|
||||||
|
the full service spec. Rift's agent deploys the container and
|
||||||
|
registers mc-proxy routes for all hostnames (both internal and public)
|
||||||
|
on its local mc-proxy.
|
||||||
|
|
||||||
|
2. **Detect public hostnames**: Master inspects the service spec for
|
||||||
|
hostnames that are not `.svc.mcp.` subdomains.
|
||||||
|
|
||||||
|
3. **Validate domains**: Master checks that `mcq.metacircular.net` falls
|
||||||
|
under an allowed domain (`metacircular.net` ✓).
|
||||||
|
|
||||||
|
4. **Resolve edge node**: Master performs a DNS lookup for
|
||||||
|
`mcq.metacircular.net`. If it's a CNAME to `svc.metacircular.net`,
|
||||||
|
the master resolves `svc.metacircular.net` to identify the edge node
|
||||||
|
as `svc`. If DNS is not yet configured (no CNAME), the master uses
|
||||||
|
the default edge node from config.
|
||||||
|
|
||||||
|
5. **Set up edge route**: Master sends an `SetupEdgeRoute` RPC to svc's
|
||||||
|
agent:
|
||||||
|
```
|
||||||
|
SetupEdgeRoute(
|
||||||
|
hostname: "mcq.metacircular.net"
|
||||||
|
backend_hostname: "mcq.svc.mcp.metacircular.net"
|
||||||
|
backend_port: 8443
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Svc agent provisions**: On receiving `SetupEdgeRoute`, svc's agent:
|
||||||
|
a. Provisions a TLS certificate from Metacrypt for
|
||||||
|
`mcq.metacircular.net`.
|
||||||
|
b. Registers an L7 route in its local mc-proxy:
|
||||||
|
`mcq.metacircular.net:443 → mcq.svc.mcp.metacircular.net:8443`
|
||||||
|
with the provisioned cert.
|
||||||
|
|
||||||
|
7. **Master records the edge route** in its own registry for undeploy
|
||||||
|
cleanup.
|
||||||
|
|
||||||
|
### Undeploy Flow
|
||||||
|
|
||||||
|
When the master receives `Undeploy(mcq)`:
|
||||||
|
|
||||||
|
1. **Look up edge routes**: Master checks its registry for edge routes
|
||||||
|
associated with mcq.
|
||||||
|
2. **Remove edge route**: Master sends `RemoveEdgeRoute(mcq.metacircular.net)`
|
||||||
|
to svc's agent. Svc's agent removes the mc-proxy route and cleans up
|
||||||
|
the cert.
|
||||||
|
3. **Undeploy on worker**: Master sends `Undeploy` RPC to rift's agent.
|
||||||
|
Rift's agent tears down the container, routes, DNS, and certs as in v1.
|
||||||
|
|
||||||
|
### Edge Node DNS Resolution
|
||||||
|
|
||||||
|
The master determines which edge node handles a public hostname by
|
||||||
|
checking DNS:
|
||||||
|
|
||||||
|
1. Look up `mcq.metacircular.net` → CNAME `svc.metacircular.net`
|
||||||
|
2. Look up `svc.metacircular.net` → IP address
|
||||||
|
3. Match the IP against known edge nodes
|
||||||
|
|
||||||
|
If no CNAME exists yet (operator hasn't set it up), the master warns but
|
||||||
|
does not fail. The operator sets up DNS manually at Hurricane Electric.
|
||||||
|
The master can provide a `mcp dns check` command that verifies all public
|
||||||
|
hostnames resolve correctly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Agent Changes for v2
|
||||||
|
|
||||||
|
### New RPCs
|
||||||
|
|
||||||
|
```protobuf
|
||||||
|
// Edge routing -- called by master on edge nodes.
|
||||||
|
rpc SetupEdgeRoute(SetupEdgeRouteRequest) returns (SetupEdgeRouteResponse);
|
||||||
|
rpc RemoveEdgeRoute(RemoveEdgeRouteRequest) returns (RemoveEdgeRouteResponse);
|
||||||
|
rpc ListEdgeRoutes(ListEdgeRoutesRequest) returns (ListEdgeRoutesResponse);
|
||||||
|
|
||||||
|
message SetupEdgeRouteRequest {
|
||||||
|
string hostname = 1; // public hostname (e.g. "mcq.metacircular.net")
|
||||||
|
string backend_hostname = 2; // internal hostname (e.g. "mcq.svc.mcp.metacircular.net")
|
||||||
|
int32 backend_port = 3; // port on the worker's mc-proxy (e.g. 8443)
|
||||||
|
}
|
||||||
|
|
||||||
|
message SetupEdgeRouteResponse {}
|
||||||
|
|
||||||
|
message RemoveEdgeRouteRequest {
|
||||||
|
string hostname = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
message RemoveEdgeRouteResponse {}
|
||||||
|
|
||||||
|
message ListEdgeRoutesRequest {}
|
||||||
|
|
||||||
|
message ListEdgeRoutesResponse {
|
||||||
|
repeated EdgeRoute routes = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
message EdgeRoute {
|
||||||
|
string hostname = 1;
|
||||||
|
string backend_hostname = 2;
|
||||||
|
int32 backend_port = 3;
|
||||||
|
string cert_serial = 4;
|
||||||
|
string cert_expires = 5;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### SetupEdgeRoute Implementation
|
||||||
|
|
||||||
|
When the agent receives `SetupEdgeRoute`:
|
||||||
|
|
||||||
|
1. **Resolve backend address**: The agent resolves `backend_hostname` to
|
||||||
|
an IP address (the worker node's overlay IP). It uses the port from
|
||||||
|
the request to form the backend address (e.g., `100.95.252.120:8443`).
|
||||||
|
|
||||||
|
2. **Provision TLS cert**: The agent calls Metacrypt's CA API to issue a
|
||||||
|
certificate for the public hostname. The cert and key are written to
|
||||||
|
the mc-proxy cert directory.
|
||||||
|
|
||||||
|
3. **Register mc-proxy route**: The agent adds an L7 route to its local
|
||||||
|
mc-proxy:
|
||||||
|
- Listener: `:443`
|
||||||
|
- Hostname: `mcq.metacircular.net`
|
||||||
|
- Backend: `100.95.252.120:8443`
|
||||||
|
- Mode: `l7`
|
||||||
|
- TLS cert/key: the provisioned cert
|
||||||
|
- Backend TLS: `true` (worker's mc-proxy serves TLS)
|
||||||
|
|
||||||
|
4. **Record the edge route** in the agent's local registry for listing
|
||||||
|
and cleanup.
|
||||||
|
|
||||||
|
### Cert Provisioning on All Agents
|
||||||
|
|
||||||
|
All agents need Metacrypt configuration to provision certs:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[metacrypt]
|
||||||
|
server_url = "https://metacrypt.svc.mcp.metacircular.net:8443"
|
||||||
|
ca_cert = "/srv/mcp/certs/metacircular-ca.pem"
|
||||||
|
mount = "pki"
|
||||||
|
issuer = "infra"
|
||||||
|
token_path = "/srv/mcp/metacrypt-token"
|
||||||
|
```
|
||||||
|
|
||||||
|
The svc agent provisions certs for public hostnames. The rift agent
|
||||||
|
provisions certs for internal hostnames. Both use the same Metacrypt API.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## CLI Changes for v2
|
||||||
|
|
||||||
|
The CLI's `[[nodes]]` config is replaced by a single master address:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[master]
|
||||||
|
address = "100.x.x.x:9555"
|
||||||
|
|
||||||
|
[mcias]
|
||||||
|
server_url = "https://mcias.metacircular.net:8443"
|
||||||
|
service_name = "mcp"
|
||||||
|
|
||||||
|
[auth]
|
||||||
|
token_path = "/home/kyle/.config/mcp/token"
|
||||||
|
|
||||||
|
[services]
|
||||||
|
dir = "/home/kyle/.config/mcp/services"
|
||||||
|
```
|
||||||
|
|
||||||
|
Commands that currently iterate over nodes (`mcp ps`, `mcp list`,
|
||||||
|
`mcp node list`) instead query the master, which aggregates from all
|
||||||
|
agents.
|
||||||
|
|
||||||
|
Service definition files remain on the operator's workstation. The CLI
|
||||||
|
pushes them to the master on `mcp deploy` and `mcp sync`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration Plan
|
||||||
|
|
||||||
|
### Phase 1: Deploy mcp-agent on svc
|
||||||
|
|
||||||
|
svc currently has no mcp-agent. Install and configure one:
|
||||||
|
|
||||||
|
- Create `mcp` user on svc (Debian: `install-agent.sh`)
|
||||||
|
- Install mcp-agent binary
|
||||||
|
- Configure with Metacrypt and mc-proxy socket access
|
||||||
|
- Verify with `mcp node list` (svc shows up)
|
||||||
|
|
||||||
|
### Phase 2: Add edge routing RPCs to agents
|
||||||
|
|
||||||
|
Implement `SetupEdgeRoute`, `RemoveEdgeRoute`, `ListEdgeRoutes` on the
|
||||||
|
agent. Test by calling them directly from the CLI before the master exists.
|
||||||
|
|
||||||
|
### Phase 3: Build mcp-master
|
||||||
|
|
||||||
|
Start with the core coordination loop:
|
||||||
|
|
||||||
|
1. Accept `Deploy` from CLI
|
||||||
|
2. Forward to the correct agent
|
||||||
|
3. Detect public hostnames
|
||||||
|
4. Call `SetupEdgeRoute` on the edge agent
|
||||||
|
|
||||||
|
### Phase 4: Provision straylight
|
||||||
|
|
||||||
|
New node (straylight) takes over as master and hosts core infrastructure:
|
||||||
|
|
||||||
|
1. Deploy mcp-agent on straylight
|
||||||
|
2. Migrate mcns, metacrypt, mcr from rift to straylight
|
||||||
|
3. Deploy mcp-master on straylight
|
||||||
|
4. Update CLI config to point at master
|
||||||
|
|
||||||
|
### Phase 5: Cut over
|
||||||
|
|
||||||
|
- Update DNS to point `*.svc.mcp.metacircular.net` at straylight
|
||||||
|
- Update service definitions to use new node assignments
|
||||||
|
- Verify all services via `mcp ps` and public endpoint tests
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open Questions
|
||||||
|
|
||||||
|
1. **Master HA**: mcp-master is a single point of failure. For v2, this
|
||||||
|
is acceptable (operator can SSH to agents directly if master is down).
|
||||||
|
v3 could add master replication or make agents self-sufficient for
|
||||||
|
local operations when the master is unreachable.
|
||||||
|
|
||||||
|
2. **Service placement**: v2 still requires explicit `node` assignment
|
||||||
|
in service definitions. Automatic placement based on resource
|
||||||
|
availability is a future concern.
|
||||||
|
|
||||||
|
3. **Cert renewal on edge**: Edge certs have a 90-day TTL. The edge
|
||||||
|
agent needs a renewal loop (similar to the existing `renewWindow`
|
||||||
|
check in `EnsureCert`) or the master needs to periodically re-check
|
||||||
|
edge routes.
|
||||||
|
|
||||||
|
4. **mc-proxy database vs config**: mc-proxy persists routes in SQLite,
|
||||||
|
which can diverge from the TOML config. The agent should be the sole
|
||||||
|
manager of mc-proxy routes via the gRPC admin API, not the TOML file.
|
||||||
|
This avoids the stale-database problem encountered during v1
|
||||||
|
operations on svc.
|
||||||
|
|
||||||
|
5. **straylight hardware**: What hardware is straylight? Does it run
|
||||||
|
NixOS or Debian? Does it use rootless podman like rift?
|
||||||
|
|
||||||
|
6. **Mono-repo for core infrastructure**: The current layout has each
|
||||||
|
service as a separate git repo under `~/src/metacircular/`. A
|
||||||
|
mono-repo for core infrastructure (mcp, mcp-master, mcns, metacrypt,
|
||||||
|
mcr, mc-proxy, mcdsl) would simplify coordinated changes (e.g., a
|
||||||
|
proto change that touches agent + CLI + mc-proxy client), eliminate
|
||||||
|
the `uses_mcdsl` build flag / vendoring, enable a single CI pipeline,
|
||||||
|
and allow atomic platform versioning (one tag per release). Non-core
|
||||||
|
application services (exo, mcq, mcdoc, sgard, kls, mcat) would
|
||||||
|
remain as separate repos with independent release cadences. This is
|
||||||
|
a large migration best tackled after straylight is running and the
|
||||||
|
master exists, when the build/deploy pipeline is already being
|
||||||
|
reorganized.
|
||||||
@@ -28,17 +28,26 @@ func routeCmd() *cobra.Command {
|
|||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
|
||||||
|
var (
|
||||||
|
routeMode string
|
||||||
|
backendTLS bool
|
||||||
|
tlsCert string
|
||||||
|
tlsKey string
|
||||||
|
)
|
||||||
|
|
||||||
add := &cobra.Command{
|
add := &cobra.Command{
|
||||||
Use: "add <listener> <hostname> <backend>",
|
Use: "add <listener> <hostname> <backend>",
|
||||||
Short: "Add a route to mc-proxy",
|
Short: "Add a route to mc-proxy",
|
||||||
Long: "Add a route. Example: mcp route add -n rift :443 mcq.metacircular.net 100.95.252.120:443",
|
Long: "Add a route. Example: mcp route add -n rift :443 mcq.svc.mcp.metacircular.net 127.0.0.1:48080 --mode l7 --tls-cert /srv/mc-proxy/certs/mcq.pem --tls-key /srv/mc-proxy/certs/mcq.key",
|
||||||
Args: cobra.ExactArgs(3),
|
Args: cobra.ExactArgs(3),
|
||||||
RunE: func(_ *cobra.Command, args []string) error {
|
RunE: func(_ *cobra.Command, args []string) error {
|
||||||
return runRouteAdd(nodeName, args)
|
return runRouteAdd(nodeName, args, routeMode, backendTLS, tlsCert, tlsKey)
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
add.Flags().String("mode", "l4", "route mode (l4 or l7)")
|
add.Flags().StringVar(&routeMode, "mode", "l4", "route mode (l4 or l7)")
|
||||||
add.Flags().Bool("backend-tls", false, "re-encrypt traffic to backend")
|
add.Flags().BoolVar(&backendTLS, "backend-tls", false, "re-encrypt traffic to backend")
|
||||||
|
add.Flags().StringVar(&tlsCert, "tls-cert", "", "path to TLS cert on the node (required for l7)")
|
||||||
|
add.Flags().StringVar(&tlsKey, "tls-key", "", "path to TLS key on the node (required for l7)")
|
||||||
|
|
||||||
remove := &cobra.Command{
|
remove := &cobra.Command{
|
||||||
Use: "remove <listener> <hostname>",
|
Use: "remove <listener> <hostname>",
|
||||||
@@ -138,7 +147,7 @@ func printRoutes(nodeName string, resp *mcpv1.ListProxyRoutesResponse) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
func runRouteAdd(nodeName string, args []string) error {
|
func runRouteAdd(nodeName string, args []string, mode string, backendTLS bool, tlsCert, tlsKey string) error {
|
||||||
if nodeName == "" {
|
if nodeName == "" {
|
||||||
return fmt.Errorf("--node is required")
|
return fmt.Errorf("--node is required")
|
||||||
}
|
}
|
||||||
@@ -166,12 +175,16 @@ func runRouteAdd(nodeName string, args []string) error {
|
|||||||
ListenerAddr: args[0],
|
ListenerAddr: args[0],
|
||||||
Hostname: args[1],
|
Hostname: args[1],
|
||||||
Backend: args[2],
|
Backend: args[2],
|
||||||
|
Mode: mode,
|
||||||
|
BackendTls: backendTLS,
|
||||||
|
TlsCert: tlsCert,
|
||||||
|
TlsKey: tlsKey,
|
||||||
})
|
})
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("add route: %w", err)
|
return fmt.Errorf("add route: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
fmt.Printf("Added route: %s → %s on %s (%s)\n", args[1], args[2], args[0], nodeName)
|
fmt.Printf("Added route: %s %s → %s on %s (%s)\n", mode, args[1], args[2], args[0], nodeName)
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -2815,6 +2815,8 @@ type AddProxyRouteRequest struct {
|
|||||||
Backend string `protobuf:"bytes,3,opt,name=backend,proto3" json:"backend,omitempty"`
|
Backend string `protobuf:"bytes,3,opt,name=backend,proto3" json:"backend,omitempty"`
|
||||||
Mode string `protobuf:"bytes,4,opt,name=mode,proto3" json:"mode,omitempty"` // "l4" or "l7"
|
Mode string `protobuf:"bytes,4,opt,name=mode,proto3" json:"mode,omitempty"` // "l4" or "l7"
|
||||||
BackendTls bool `protobuf:"varint,5,opt,name=backend_tls,json=backendTls,proto3" json:"backend_tls,omitempty"`
|
BackendTls bool `protobuf:"varint,5,opt,name=backend_tls,json=backendTls,proto3" json:"backend_tls,omitempty"`
|
||||||
|
TlsCert string `protobuf:"bytes,6,opt,name=tls_cert,json=tlsCert,proto3" json:"tls_cert,omitempty"` // path to TLS cert (required for l7)
|
||||||
|
TlsKey string `protobuf:"bytes,7,opt,name=tls_key,json=tlsKey,proto3" json:"tls_key,omitempty"` // path to TLS key (required for l7)
|
||||||
unknownFields protoimpl.UnknownFields
|
unknownFields protoimpl.UnknownFields
|
||||||
sizeCache protoimpl.SizeCache
|
sizeCache protoimpl.SizeCache
|
||||||
}
|
}
|
||||||
@@ -2884,6 +2886,20 @@ func (x *AddProxyRouteRequest) GetBackendTls() bool {
|
|||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
|
|
||||||
|
func (x *AddProxyRouteRequest) GetTlsCert() string {
|
||||||
|
if x != nil {
|
||||||
|
return x.TlsCert
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
|
func (x *AddProxyRouteRequest) GetTlsKey() string {
|
||||||
|
if x != nil {
|
||||||
|
return x.TlsKey
|
||||||
|
}
|
||||||
|
return ""
|
||||||
|
}
|
||||||
|
|
||||||
type AddProxyRouteResponse struct {
|
type AddProxyRouteResponse struct {
|
||||||
state protoimpl.MessageState `protogen:"open.v1"`
|
state protoimpl.MessageState `protogen:"open.v1"`
|
||||||
unknownFields protoimpl.UnknownFields
|
unknownFields protoimpl.UnknownFields
|
||||||
@@ -3198,14 +3214,16 @@ const file_proto_mcp_v1_mcp_proto_rawDesc = "" +
|
|||||||
"\x11total_connections\x18\x02 \x01(\x03R\x10totalConnections\x129\n" +
|
"\x11total_connections\x18\x02 \x01(\x03R\x10totalConnections\x129\n" +
|
||||||
"\n" +
|
"\n" +
|
||||||
"started_at\x18\x03 \x01(\v2\x1a.google.protobuf.TimestampR\tstartedAt\x127\n" +
|
"started_at\x18\x03 \x01(\v2\x1a.google.protobuf.TimestampR\tstartedAt\x127\n" +
|
||||||
"\tlisteners\x18\x04 \x03(\v2\x19.mcp.v1.ProxyListenerInfoR\tlisteners\"\xa6\x01\n" +
|
"\tlisteners\x18\x04 \x03(\v2\x19.mcp.v1.ProxyListenerInfoR\tlisteners\"\xda\x01\n" +
|
||||||
"\x14AddProxyRouteRequest\x12#\n" +
|
"\x14AddProxyRouteRequest\x12#\n" +
|
||||||
"\rlistener_addr\x18\x01 \x01(\tR\flistenerAddr\x12\x1a\n" +
|
"\rlistener_addr\x18\x01 \x01(\tR\flistenerAddr\x12\x1a\n" +
|
||||||
"\bhostname\x18\x02 \x01(\tR\bhostname\x12\x18\n" +
|
"\bhostname\x18\x02 \x01(\tR\bhostname\x12\x18\n" +
|
||||||
"\abackend\x18\x03 \x01(\tR\abackend\x12\x12\n" +
|
"\abackend\x18\x03 \x01(\tR\abackend\x12\x12\n" +
|
||||||
"\x04mode\x18\x04 \x01(\tR\x04mode\x12\x1f\n" +
|
"\x04mode\x18\x04 \x01(\tR\x04mode\x12\x1f\n" +
|
||||||
"\vbackend_tls\x18\x05 \x01(\bR\n" +
|
"\vbackend_tls\x18\x05 \x01(\bR\n" +
|
||||||
"backendTls\"\x17\n" +
|
"backendTls\x12\x19\n" +
|
||||||
|
"\btls_cert\x18\x06 \x01(\tR\atlsCert\x12\x17\n" +
|
||||||
|
"\atls_key\x18\a \x01(\tR\x06tlsKey\"\x17\n" +
|
||||||
"\x15AddProxyRouteResponse\"Z\n" +
|
"\x15AddProxyRouteResponse\"Z\n" +
|
||||||
"\x17RemoveProxyRouteRequest\x12#\n" +
|
"\x17RemoveProxyRouteRequest\x12#\n" +
|
||||||
"\rlistener_addr\x18\x01 \x01(\tR\flistenerAddr\x12\x1a\n" +
|
"\rlistener_addr\x18\x01 \x01(\tR\flistenerAddr\x12\x1a\n" +
|
||||||
|
|||||||
@@ -134,7 +134,8 @@ func (a *Agent) deployComponent(ctx context.Context, serviceName string, cs *mcp
|
|||||||
Error: fmt.Sprintf("allocate route ports: %v", err),
|
Error: fmt.Sprintf("allocate route ports: %v", err),
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
runSpec.Ports = ports
|
// Merge explicit ports from the spec with route-allocated ports.
|
||||||
|
runSpec.Ports = append(cs.GetPorts(), ports...)
|
||||||
runSpec.Env = append(runSpec.Env, env...)
|
runSpec.Env = append(runSpec.Env, env...)
|
||||||
} else {
|
} else {
|
||||||
// Legacy: use ports directly from the spec.
|
// Legacy: use ports directly from the spec.
|
||||||
|
|||||||
@@ -69,6 +69,8 @@ func (a *Agent) AddProxyRoute(ctx context.Context, req *mcpv1.AddProxyRouteReque
|
|||||||
Backend: req.GetBackend(),
|
Backend: req.GetBackend(),
|
||||||
Mode: req.GetMode(),
|
Mode: req.GetMode(),
|
||||||
BackendTLS: req.GetBackendTls(),
|
BackendTLS: req.GetBackendTls(),
|
||||||
|
TLSCert: req.GetTlsCert(),
|
||||||
|
TLSKey: req.GetTlsKey(),
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := a.Proxy.AddRoute(ctx, req.GetListenerAddr(), route); err != nil {
|
if err := a.Proxy.AddRoute(ctx, req.GetListenerAddr(), route); err != nil {
|
||||||
|
|||||||
@@ -362,6 +362,8 @@ message AddProxyRouteRequest {
|
|||||||
string backend = 3;
|
string backend = 3;
|
||||||
string mode = 4; // "l4" or "l7"
|
string mode = 4; // "l4" or "l7"
|
||||||
bool backend_tls = 5;
|
bool backend_tls = 5;
|
||||||
|
string tls_cert = 6; // path to TLS cert (required for l7)
|
||||||
|
string tls_key = 7; // path to TLS key (required for l7)
|
||||||
}
|
}
|
||||||
|
|
||||||
message AddProxyRouteResponse {}
|
message AddProxyRouteResponse {}
|
||||||
|
|||||||
Reference in New Issue
Block a user