Emit JobRunTerminated event when executor observes pod terminal phase#4920
Emit JobRunTerminated event when executor observes pod terminal phase#4920dejanzele wants to merge 1 commit into
Conversation
Greptile SummaryThis PR introduces a new
Confidence Score: 5/5This PR is safe to merge. The IS NULL guard correctly arbitrates between all writers of terminated_timestamp, the FailedTime/TerminatedTime semantic split is sound, and the pre-migration fallback preserves existing histogram behaviour. The changes are well-scoped and internally consistent. The IS NULL guard in runsSetTimestampIfNullStmt correctly defers to earlier writers (MarkRunsSucceeded, MarkJobsCancelled) while letting the kubelet-observed time fill in for preempted and organically-failed runs. The FailedTime() → TerminatedTime() fallback in state_metrics.go handles pre-migration rows cleanly. LatestContainerFinishedAt documents its node-loss edge case and falls back safely. Tests cover the main guard paths and reconciliation logic. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant K8s as Kubernetes/Kubelet
participant Exec as Executor
participant Kafka as Kafka/Pulsar
participant Ingest as SchedulerIngester
participant DB as Scheduler DB
K8s->>Exec: Pod → PodSucceeded/PodFailed (informer event)
Exec->>Exec: reportCurrentStatus/reportStatusUpdate
Exec->>Exec: attemptToReportJobRunTerminatedEvent
Exec->>Kafka: "JobRunTerminated {Created=LatestContainerFinishedAt}"
Note over Exec: Separately, scheduler also emits:
Exec->>Kafka: JobRunErrors (for failed/preempted runs)
Kafka->>Ingest: JobRunErrors
Ingest->>DB: "MarkRunsFailed → failed=true, failed_timestamp=event_time"
Kafka->>Ingest: JobRunTerminated
Ingest->>DB: "MarkRunsTerminated → terminated_timestamp=kubelet_time (WHERE IS NULL)"
Note over DB: Succeeded runs: MarkRunsSucceeded already set terminated_timestamp → IS NULL blocks
Note over DB: Failed/preempted runs: terminated_timestamp was NULL → IS NULL allows kubelet time in
Reviews (24): Last reviewed commit: "Emit JobRunTerminated event when executo..." | Re-trigger Greptile |
750bc95 to
915e9ee
Compare
03170a8 to
fbe4c54
Compare
ff69b00 to
a3d367d
Compare
a3d367d to
c9fda95
Compare
6b55e34 to
5ff969a
Compare
2731e41 to
6ae99e8
Compare
`TestEventServer_GetJobSetEvents_ErrorIfMissing` (4 subtests) and `TestEventServer_GetJobSetEvents_Permissions` (3 subtests) share a single 5s `armadacontext.WithTimeout` across all their subtests. Each subtest spins up a fresh lookout DB and runs every migration via `withEventServer` and `WithLookoutDb`, so the per-subtest cost is not trivial. On a slow CI runner the cumulative cost blew the 5s budget. We hit this on PR #4920 where 4 subtests of `_ErrorIfMissing` took 5.74s combined and the last one died with `context deadline exceeded` inside `PostgresQueueRepository.upsertQueue`. Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
6ae99e8 to
540b066
Compare
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
540b066 to
39b4c26
Compare
`TestEventServer_GetJobSetEvents_ErrorIfMissing` (4 subtests) and `TestEventServer_GetJobSetEvents_Permissions` (3 subtests) share a single 5s `armadacontext.WithTimeout` across all their subtests. Each subtest spins up a fresh lookout DB and runs every migration via `withEventServer` and `WithLookoutDb`, so the per-subtest cost is not trivial. On a slow CI runner the cumulative cost blew the 5s budget. We hit this on PR #4920 where 4 subtests of `_ErrorIfMissing` took 5.74s combined and the last one died with `context deadline exceeded` inside `PostgresQueueRepository.upsertQueue`. Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com> Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>
When the scheduler preempts a run, the pod takes a variable grace period (5s to 5min depending on
TerminationGracePeriodSeconds) to actually exit.preempted_timestamprecords when the scheduler made the decision, but nothing records when the kubelet finished stopping the containers, so any future retry-policy consumer has no per-run signal to wait on.This PR adds that signal. It does not consume it; the retry-gate is a separate follow-up.
What this PR does
Adds a new
JobRunTerminatedevent emitted by the executor when it observes a pod reachPodSucceededorPodFailed. The scheduleringester writesruns.terminated_timestampfrom this event when the column is NULL, so the value reflects the kubelet-observed stop time rather than the scheduler decision time.To preserve the existing Failed-state histogram, this PR also adds a
failed_timestampcolumn (migration 039).MarkRunsFailedwrites tofailed_timestampinstead ofterminated_timestamp, andstate_metrics.goreadsFailedTime()for the failed branch. The semantic split:failed_timestamp: when armada decided the run failed (scheduler event-time).terminated_timestamp: when the kubelet observed the pod stop (executor-observed).For organic Succeeded or Cancelled runs the existing writers fill
terminated_timestampfirst and theJobRunTerminatedwrite is a no-op. For preempted or organic-failed runs the column is NULL after the initial write, soJobRunTerminatedfills it with the real kubelet-observed time.