Add boot sequencing to agent

The agent reads [[boot.sequence]] stages from its config and starts services in dependency order before accepting gRPC connections. Each stage waits for its services to pass health checks before proceeding: - tcp: TCP connect to the container's mapped port - grpc: standard gRPC health check Foundation stage (stage 0): blocks and retries indefinitely if health fails — all downstream services depend on it. Non-foundation stages: log warning and proceed on failure. Uses the recover logic to start containers from the registry, then health-checks to verify readiness. Config example: [[boot.sequence]] name = "foundation" services = ["mcias", "mcns"] timeout = "120s" health = "tcp" Architecture v2 Phase 4 feature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:53:11 -07:00
parent 9d543998dc
commit fa4d022bc1
3 changed files with 231 additions and 0 deletions
--- a/internal/config/agent.go
+++ b/internal/config/agent.go
@@ -19,6 +19,23 @@ type AgentConfig struct {
 	MCNS      MCNSConfig      `toml:"mcns"`
 	Monitor   MonitorConfig   `toml:"monitor"`
 	Log       LogConfig       `toml:"log"`
+	Boot      BootConfig      `toml:"boot"`
+}
+
+// BootConfig holds the boot sequence for the master node.
+// Each stage's services must be healthy before the next stage starts.
+// Worker and edge nodes don't use this — they wait for the master.
+type BootConfig struct {
+	Sequence []BootStage `toml:"sequence"`
+}
+
+// BootStage defines a group of services that must be started and healthy
+// before the next stage begins.
+type BootStage struct {
+	Name     string   `toml:"name"`
+	Services []string `toml:"services"`
+	Timeout  Duration `toml:"timeout"`
+	Health   string   `toml:"health"` // "tcp", "grpc", or "http"
 }

 // MetacryptConfig holds the Metacrypt CA integration settings for