Emit JobRunTerminated event when executor observes pod terminal phase by dejanzele · Pull Request #4920 · armadaproject/armada

dejanzele · 2026-05-19T12:00:36Z

When the scheduler preempts a run, the pod takes a variable grace period (5s to 5min depending on TerminationGracePeriodSeconds) to actually exit. preempted_timestamp records when the scheduler made the decision, but nothing records when the kubelet finished stopping the containers, so any future retry-policy consumer has no per-run signal to wait on.

This PR adds that signal. It does not consume it; the retry-gate is a separate follow-up.

What this PR does

Adds a new JobRunTerminated event emitted by the executor when it observes a pod reach PodSucceeded or PodFailed. The scheduleringester writes runs.terminated_timestamp from this event when the column is NULL, so the value reflects the kubelet-observed stop time rather than the scheduler decision time.

To preserve the existing Failed-state histogram, this PR also adds a failed_timestamp column (migration 039). MarkRunsFailed writes to failed_timestamp instead of terminated_timestamp, and state_metrics.go reads FailedTime() for the failed branch. The semantic split:

failed_timestamp: when armada decided the run failed (scheduler event-time).
terminated_timestamp: when the kubelet observed the pod stop (executor-observed).

For organic Succeeded or Cancelled runs the existing writers fill terminated_timestamp first and the JobRunTerminated write is a no-op. For preempted or organic-failed runs the column is NULL after the initial write, so JobRunTerminated fills it with the real kubelet-observed time.

greptile-apps · 2026-05-19T12:09:20Z

Greptile Summary

This PR introduces a new JobRunTerminated proto event and the infrastructure to emit, ingest, and store it, filling terminated_timestamp with the kubelet-observed pod stop time rather than the scheduler decision time. A companion failed_timestamp column (migration 039) separates "when Armada decided the run failed" from "when the kubelet stopped the pod", preserving existing metrics semantics.

Executor side: attemptToReportJobRunTerminatedEvent is called inline from the informer Add/Update handlers and from a new reconciliation loop in ReportMissingJobEvents; the kubelet FinishedAt time flows through LatestContainerFinishedAt → event.Created → MarkRunsTerminated.
Scheduler-ingester side: handleJobRunTerminated produces MarkRunsTerminated, which uses an IS NULL guard (runsSetTimestampIfNullStmt) so succeeded/cancelled runs keep their existing timestamp, while preempted and failed runs get the kubelet-observed time filled in.
Metrics side: the Failed histogram branch now reads FailedTime() (decision time) and falls back to TerminatedTime() for pre-migration rows where failed_timestamp is NULL.

Confidence Score: 5/5

This PR is safe to merge. The IS NULL guard correctly arbitrates between all writers of terminated_timestamp, the FailedTime/TerminatedTime semantic split is sound, and the pre-migration fallback preserves existing histogram behaviour.

The changes are well-scoped and internally consistent. The IS NULL guard in runsSetTimestampIfNullStmt correctly defers to earlier writers (MarkRunsSucceeded, MarkJobsCancelled) while letting the kubelet-observed time fill in for preempted and organically-failed runs. The FailedTime() → TerminatedTime() fallback in state_metrics.go handles pre-migration rows cleanly. LatestContainerFinishedAt documents its node-loss edge case and falls back safely. Tests cover the main guard paths and reconciliation logic.

No files require special attention.

Important Files Changed

Filename	Overview
internal/executor/reporter/event.go	Adds CreateJobRunTerminatedEvent using LatestContainerFinishedAt for the event timestamp — correctly reads kubelet-reported FinishedAt rather than wall-clock time.
internal/executor/util/pod_util.go	Adds LatestContainerFinishedAt (scans container and init-container statuses, falls back to time.Now() for node-loss pods) and HasJobRunTerminatedBeenReported; well-documented and tested.
internal/executor/service/job_state_reporter.go	Wires attemptToReportJobRunTerminatedEvent into Add/UpdateFunc handlers and adds a reconciliation loop in ReportMissingJobEvents; idempotent via reported_terminated annotation.
internal/scheduleringester/schedulerdb.go	Changes MarkRunsFailed to write failed_timestamp instead of terminated_timestamp; adds MarkRunsTerminated case using runsSetTimestampIfNullStmt with a clean IS NULL guard.
internal/scheduler/metrics/state_metrics.go	Failed-histogram branch now uses FailedTime() with TerminatedTime() fallback for pre-migration rows; correctly preserves semantics across the migration boundary.
internal/scheduler/database/migrations/039_add_runs_failed_timestamp.sql	Adds failed_timestamp nullable timestamptz column with IF NOT EXISTS guard; safe to apply.
internal/scheduleringester/dbops.go	Adds MarkRunsTerminated map type with Merge, CanBeAppliedBefore, and GetOperation implementations consistent with sibling operations.
internal/scheduleringester/instructions.go	Routes JobRunTerminated events to handleJobRunTerminated which emits MarkRunsTerminated using event.Created (the kubelet FinishedAt time) as the stored timestamp.
internal/scheduler/metrics.go	Fixes inflated queued-time window: falls back to FailedTime() when TerminatedTime() is NULL (between MarkRunsFailed and MarkRunsTerminated arrival).
internal/scheduleringester/schedulerdb_test.go	New TestMarkRunsTerminated covers preempted, succeeded, and failed seed states; MarkRunsFailed assertion updated to check failed_timestamp instead of terminated_timestamp.

Sequence Diagram

sequenceDiagram
    participant K8s as Kubernetes/Kubelet
    participant Exec as Executor
    participant Kafka as Kafka/Pulsar
    participant Ingest as SchedulerIngester
    participant DB as Scheduler DB

    K8s->>Exec: Pod → PodSucceeded/PodFailed (informer event)
    Exec->>Exec: reportCurrentStatus/reportStatusUpdate
    Exec->>Exec: attemptToReportJobRunTerminatedEvent
    Exec->>Kafka: "JobRunTerminated {Created=LatestContainerFinishedAt}"

    Note over Exec: Separately, scheduler also emits:
    Exec->>Kafka: JobRunErrors (for failed/preempted runs)

    Kafka->>Ingest: JobRunErrors
    Ingest->>DB: "MarkRunsFailed → failed=true, failed_timestamp=event_time"

    Kafka->>Ingest: JobRunTerminated
    Ingest->>DB: "MarkRunsTerminated → terminated_timestamp=kubelet_time (WHERE IS NULL)"

    Note over DB: Succeeded runs: MarkRunsSucceeded already set terminated_timestamp → IS NULL blocks
    Note over DB: Failed/preempted runs: terminated_timestamp was NULL → IS NULL allows kubelet time in

_{Reviews (24): Last reviewed commit: "Emit JobRunTerminated event when executo..." | Re-trigger Greptile}

dejanzele · 2026-05-20T16:50:27Z

@greptileai

dejanzele · 2026-05-22T09:52:22Z

@greptileai

`TestEventServer_GetJobSetEvents_ErrorIfMissing` (4 subtests) and `TestEventServer_GetJobSetEvents_Permissions` (3 subtests) share a single 5s `armadacontext.WithTimeout` across all their subtests. Each subtest spins up a fresh lookout DB and runs every migration via `withEventServer` and `WithLookoutDb`, so the per-subtest cost is not trivial. On a slow CI runner the cumulative cost blew the 5s budget. We hit this on PR #4920 where 4 subtests of `_ErrorIfMissing` took 5.74s combined and the last one died with `context deadline exceeded` inside `PostgresQueueRepository.upsertQueue`. Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

dejanzele · 2026-05-26T13:34:16Z

@greptileai

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

`TestEventServer_GetJobSetEvents_ErrorIfMissing` (4 subtests) and `TestEventServer_GetJobSetEvents_Permissions` (3 subtests) share a single 5s `armadacontext.WithTimeout` across all their subtests. Each subtest spins up a fresh lookout DB and runs every migration via `withEventServer` and `WithLookoutDb`, so the per-subtest cost is not trivial. On a slow CI runner the cumulative cost blew the 5s budget. We hit this on PR #4920 where 4 subtests of `_ErrorIfMissing` took 5.74s combined and the last one died with `context deadline exceeded` inside `PostgresQueueRepository.upsertQueue`. Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com> Signed-off-by: Nikola Jokic <jokicnikola07@gmail.com>

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

Comment thread internal/executor/reporter/event.go

dejanzele force-pushed the job-run-terminated branch 6 times, most recently from 750bc95 to 915e9ee Compare May 20, 2026 13:34

greptile-apps Bot reviewed May 20, 2026

View reviewed changes

Comment thread internal/scheduleringester/schedulerdb.go Outdated

dejanzele force-pushed the job-run-terminated branch 3 times, most recently from 03170a8 to fbe4c54 Compare May 20, 2026 16:45

dejanzele force-pushed the job-run-terminated branch 6 times, most recently from ff69b00 to a3d367d Compare May 21, 2026 11:34

dejanzele mentioned this pull request May 21, 2026

Fix flaky TestEventServer_GetJobSetEvents context budget #4925

Merged

dejanzele force-pushed the job-run-terminated branch from a3d367d to c9fda95 Compare May 21, 2026 17:52

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Comment thread internal/scheduler/metrics/state_metrics.go

dejanzele force-pushed the job-run-terminated branch 2 times, most recently from 6b55e34 to 5ff969a Compare May 22, 2026 09:41

dejanzele force-pushed the job-run-terminated branch 3 times, most recently from 2731e41 to 6ae99e8 Compare May 22, 2026 10:53

dejanzele force-pushed the job-run-terminated branch from 6ae99e8 to 540b066 Compare May 22, 2026 12:23

Emit JobRunTerminated event when executor observes pod terminal phase

39b4c26

Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>

dejanzele force-pushed the job-run-terminated branch from 540b066 to 39b4c26 Compare May 27, 2026 07:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emit JobRunTerminated event when executor observes pod terminal phase#4920

Emit JobRunTerminated event when executor observes pod terminal phase#4920
dejanzele wants to merge 1 commit into
armadaproject:masterfrom
dejanzele:job-run-terminated

dejanzele commented May 19, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dejanzele commented May 20, 2026

Uh oh!

Uh oh!

dejanzele commented May 22, 2026

Uh oh!

dejanzele commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dejanzele commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Uh oh!

greptile-apps Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

dejanzele commented May 20, 2026

Uh oh!

Uh oh!

dejanzele commented May 22, 2026

Uh oh!

dejanzele commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dejanzele commented May 19, 2026 •

edited

Loading

greptile-apps Bot commented May 19, 2026 •

edited

Loading