Add boot sequencing to agent

The agent reads [[boot.sequence]] stages from its config and starts services in dependency order before accepting gRPC connections. Each stage waits for its services to pass health checks before proceeding: - tcp: TCP connect to the container's mapped port - grpc: standard gRPC health check Foundation stage (stage 0): blocks and retries indefinitely if health fails — all downstream services depend on it. Non-foundation stages: log warning and proceed on failure. Uses the recover logic to start containers from the registry, then health-checks to verify readiness. Config example: [[boot.sequence]] name = "foundation" services = ["mcias", "mcns"] timeout = "120s" health = "tcp" Architecture v2 Phase 4 feature. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 11:53:11 -07:00
parent 9d543998dc
commit fa4d022bc1
3 changed files with 231 additions and 0 deletions
--- a/internal/agent/agent.go
+++ b/internal/agent/agent.go
@@ -119,6 +119,18 @@ func Run(cfg *config.AgentConfig, version string) error {
 		"runtime", cfg.Agent.ContainerRuntime,
 	)

+	// Run boot sequence before starting the gRPC server.
+	// On the master node, this starts foundation services (MCIAS, MCNS)
+	// before core services, ensuring dependencies are met.
+	if len(cfg.Boot.Sequence) > 0 {
+		bootCtx, bootCancel := context.WithCancel(context.Background())
+		defer bootCancel()
+		if err := a.RunBootSequence(bootCtx); err != nil {
+			logger.Error("boot sequence failed", "err", err)
+			// Continue starting the gRPC server — partial boot is better than no agent.
+		}
+	}
+
 	mon.Start()

 	ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)