Making scheduling cycle optionally run asynchronously#4952
Conversation
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com> # Conflicts: # internal/scheduler/scheduler.go # internal/scheduler/scheduler_test.go # internal/scheduler/scheduling/scheduling_algo.go
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com> # Conflicts: # internal/scheduler/scheduler.go # internal/scheduler/scheduling/context/queue.go # internal/scheduler/scheduling/context/queue_test.go # internal/scheduler/scheduling/context/scheduling.go
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Greptile SummaryThis PR introduces an optional asynchronous scheduling mode that decouples the scheduling algorithm from the main scheduler loop, allowing job state transitions (cancel, fail, preempt) to flow through without waiting for the scheduling algo to complete. When async mode is enabled, the background goroutine operates on a
Confidence Score: 5/5The async scheduling mode is safe to merge; the state machine, cancellation path, and reconciliation logic are all well-reasoned and well-tested. The core state machine in AsyncSchedulingRunner is correct and race-free. Reset() blocks until any in-flight scheduling run has finished before returning, ensuring the next Trigger always schedules against committed state. The two findings in the reconcile path are defensive-coding concerns against non-standard SchedulingAlgo implementations; the production FairSchedulingAlgo invariants prevent them from firing today. Test coverage for the async lifecycle, leadership transitions, and reconciliation logic is thorough. internal/scheduler/scheduling/runner/async.go — the reconcile helpers updatePreemptedJobs and the UpdateFairShares call in reconcile() assume AdditionalSchedulingInfo/EvictorResult/SchedulingContext are non-nil; worth adding nil guards for future-proofing. Important Files Changed
Sequence DiagramsequenceDiagram
participant ML as Main Loop
participant AR as AsyncRunner
participant BG as Background Goroutine
participant DB as JobDb
ML->>AR: Reset() [on become-leader]
AR-->>ML: (state → Idle)
ML->>AR: "GetSchedulerResult() [shouldSchedule=true, no result]"
AR-->>ML: "nil, nil (state=Idle)"
ML->>AR: Trigger() [after clean commit]
AR->>BG: wake (state → RunRequested → Running)
BG->>DB: DryRunTxn()
BG->>BG: schedulingAlgo.Schedule(runCtx, txn)
BG-->>AR: finishRun(result) [state → ResultReady]
ML->>AR: GetSchedulerResult() [next shouldSchedule cycle]
AR->>AR: reconcile(result, currentTxn)
AR->>AR: upsertSchedulerResult(txn, result)
AR-->>ML: "*SchedulerResult (state → Idle)"
ML->>ML: Publish events, commit txn
ML->>AR: Trigger() [after clean commit]
AR->>BG: wake (state → RunRequested → Running)
Note over ML,AR: On error or leadership loss
ML->>AR: Reset() [become-leader / error path]
AR->>BG: cancel() [if Running]
BG-->>AR: done closed
AR-->>ML: (state → Idle, result discarded)
Reviews (3): Last reviewed commit: "Merge branch 'master' into async_schedul..." | Re-trigger Greptile |
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
This PR makes it so the scheduler can run the scheduling loop in 2 modes:
Motivation
The reason for this PR is that most job state transitions run through the scheduler (pending/running/succeeded/failed/preempted/cancelled etc), but they are getting delayed by the scheduling algo taking a long time, resulting in poor UX.
By splitting the scheduling_algo into a separate routine that gets its result merged in when ready, we can have a far more responsive system
Caveats
When the scheduling algo runs, it maintains a snapshot of the state at the time it started. As time passes it inevitably gets more and more out of date
Meaning we can end up scheduling a job that has been cancelled on the main loop.
There is reconciliation code to remove decisions on jobs that have since finished