Add boot sequencing to agent

The agent reads [[boot.sequence]] stages from its config and starts
services in dependency order before accepting gRPC connections. Each
stage waits for its services to pass health checks before proceeding:

- tcp: TCP connect to the container's mapped port
- grpc: standard gRPC health check

Foundation stage (stage 0): blocks and retries indefinitely if health
fails — all downstream services depend on it.
Non-foundation stages: log warning and proceed on failure.

Uses the recover logic to start containers from the registry, then
health-checks to verify readiness.

Config example:
  [[boot.sequence]]
  name = "foundation"
  services = ["mcias", "mcns"]
  timeout = "120s"
  health = "tcp"

Architecture v2 Phase 4 feature.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-04 11:53:11 -07:00
parent 9d543998dc
commit fa4d022bc1
3 changed files with 231 additions and 0 deletions

View File

@@ -119,6 +119,18 @@ func Run(cfg *config.AgentConfig, version string) error {
"runtime", cfg.Agent.ContainerRuntime,
)
// Run boot sequence before starting the gRPC server.
// On the master node, this starts foundation services (MCIAS, MCNS)
// before core services, ensuring dependencies are met.
if len(cfg.Boot.Sequence) > 0 {
bootCtx, bootCancel := context.WithCancel(context.Background())
defer bootCancel()
if err := a.RunBootSequence(bootCtx); err != nil {
logger.Error("boot sequence failed", "err", err)
// Continue starting the gRPC server — partial boot is better than no agent.
}
}
mon.Start()
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)