runtime/qemu: guard VM liveness against PID reuse
pidOf trusted any live PID from the pidfile. After a VM is killed (e.g. an agent-restart cgroup kill) its stale pidfile can hold a PID the kernel has reused for an unrelated process, so the VM falsely reported "running" — Recover then skipped it and it stayed dead in drift. pidOf now confirms /proc/<pid>/cmdline references the VM's state dir before trusting it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -485,6 +485,16 @@ func (q *QEMU) pidOf(name string) int {
|
||||
if err := syscall.Kill(pid, 0); err != nil {
|
||||
return 0
|
||||
}
|
||||
// Guard against PID reuse: a stale pidfile from a VM that was killed (e.g.
|
||||
// by an agent-restart cgroup kill) may hold a PID that the kernel has since
|
||||
// reused for an unrelated process. Confirm the live process is in fact this
|
||||
// VM's QEMU by checking its cmdline references the VM's state dir (every
|
||||
// launch passes -pidfile/-serial/-qmp paths under vmDir). Without this, a
|
||||
// dead VM reports "running" and is never recovered.
|
||||
cmdline, err := os.ReadFile(fmt.Sprintf("/proc/%d/cmdline", pid))
|
||||
if err != nil || !strings.Contains(string(cmdline), q.vmDir(name)) {
|
||||
return 0
|
||||
}
|
||||
return pid
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user