fix: honor MaxSessionDuration in placeholder expiry ZSet score#276
Conversation
Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request updates the sandbox placeholder creation logic to use the ShutdownTime from the sandbox specification if available, falling back to the default TTL otherwise. It also adds a table-driven test suite to cover these scenarios. The review feedback recommends refactoring the expiration logic to avoid redundant system clock calls and tightening the test assertions to use exact equality for better precision.
| expiresAt := time.Now().Add(DefaultSandboxTTL) | ||
| if sandboxCR.Spec.Lifecycle.ShutdownTime != nil { | ||
| expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time | ||
| } |
There was a problem hiding this comment.
The calculation of time.Now().Add(DefaultSandboxTTL) is redundant when sandboxCR.Spec.Lifecycle.ShutdownTime is present, as it is immediately overwritten. While the performance impact is minimal, refactoring to an if/else block improves clarity and avoids an unnecessary system clock call in the common case where a specific shutdown time is configured.
var expiresAt time.Time
if sandboxCR.Spec.Lifecycle.ShutdownTime != nil {
expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time
} else {
expiresAt = time.Now().Add(DefaultSandboxTTL)
}| }, | ||
| validate: func(t *testing.T, result *types.SandboxInfo) { | ||
| expected := now.Add(24 * time.Hour) | ||
| assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second) |
There was a problem hiding this comment.
In this test case, result.ExpiresAt is directly assigned from sandboxCR.Spec.Lifecycle.ShutdownTime.Time, which was initialized with expected. Using assert.WithinDuration with a 2-second margin is unnecessarily loose. Using assert.Equal would more precisely verify that the absolute time from the CR is honored exactly, ensuring that the logic isn't incorrectly falling back to a time.Now() based calculation.
| assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second) | |
| assert.Equal(t, expected, result.ExpiresAt) |
| }, | ||
| validate: func(t *testing.T, result *types.SandboxInfo) { | ||
| expected := now.Add(30 * time.Minute) | ||
| assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second) |
There was a problem hiding this comment.
Similar to the previous case, since result.ExpiresAt is expected to match the absolute time provided in the sandboxCR, consider using assert.Equal to verify exact precision. This confirms that the configured ShutdownTime is used directly without any unintended drift or recalculation.
| assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second) | |
| assert.Equal(t, expected, result.ExpiresAt) |
There was a problem hiding this comment.
Pull request overview
Fixes sandbox GC expiry by making the placeholder SandboxInfo.ExpiresAt honor the sandbox CR’s Spec.Lifecycle.ShutdownTime (which is derived from user-configured MaxSessionDuration) instead of always using the 8h default TTL.
Changes:
- Update
buildSandboxPlaceHolderto prefersandboxCR.Spec.Lifecycle.ShutdownTimewhen present. - Add table-driven unit tests covering placeholder expiry fallback and override cases.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| pkg/workloadmanager/sandbox_helper.go | Uses Spec.Lifecycle.ShutdownTime to compute placeholder ExpiresAt (affects GC ZSet score via StoreSandbox). |
| pkg/workloadmanager/sandbox_helper_test.go | Adds tests validating placeholder expiry behavior with/without ShutdownTime. |
| expiresAt := time.Now().Add(DefaultSandboxTTL) | ||
| if sandboxCR.Spec.Lifecycle.ShutdownTime != nil { | ||
| expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time | ||
| } |
There was a problem hiding this comment.
buildSandboxPlaceHolder only honors Spec.Lifecycle.ShutdownTime if it is populated on the Sandbox object. In the warm-pool CodeInterpreter path (buildSandboxByCodeInterpreter when WarmPoolSize > 0), the code constructs a simpleSandbox without Spec.Lifecycle.ShutdownTime, so this function will still fall back to DefaultSandboxTTL and the GC expiry ZSet score will continue to ignore MaxSessionDuration for warm-pool sessions. Consider ensuring the warm-pool path populates ShutdownTime on the placeholder Sandbox (or pass the desired expiry/ttl into buildSandboxPlaceHolder) so the stored ZSet score reflects the configured duration in all flows.
| func TestBuildSandboxPlaceHolder_TableDriven(t *testing.T) { | ||
| now := time.Now() | ||
|
|
||
| tests := []struct { | ||
| name string | ||
| setupSandbox func() *sandboxv1alpha1.Sandbox | ||
| entry *sandboxEntry | ||
| validate func(t *testing.T, result *types.SandboxInfo) | ||
| }{ | ||
| { | ||
| name: "no ShutdownTime falls back to DefaultSandboxTTL", | ||
| setupSandbox: func() *sandboxv1alpha1.Sandbox { | ||
| return &sandboxv1alpha1.Sandbox{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "test-sandbox", | ||
| Namespace: "default", | ||
| }, | ||
| } | ||
| }, | ||
| entry: &sandboxEntry{ | ||
| Kind: types.SandboxKind, | ||
| SessionID: "session-123", | ||
| }, | ||
| validate: func(t *testing.T, result *types.SandboxInfo) { | ||
| expected := now.Add(DefaultSandboxTTL) | ||
| assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second) | ||
| assert.Equal(t, "creating", result.Status) | ||
| assert.Equal(t, "session-123", result.SessionID) | ||
| }, | ||
| }, | ||
| { | ||
| name: "ShutdownTime set to 24h is used as ExpiresAt", | ||
| setupSandbox: func() *sandboxv1alpha1.Sandbox { | ||
| shutdownTime := now.Add(24 * time.Hour) | ||
| return &sandboxv1alpha1.Sandbox{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "test-sandbox", | ||
| Namespace: "default", | ||
| }, | ||
| Spec: sandboxv1alpha1.SandboxSpec{ | ||
| Lifecycle: sandboxv1alpha1.Lifecycle{ | ||
| ShutdownTime: &metav1.Time{Time: shutdownTime}, | ||
| }, | ||
| }, | ||
| } | ||
| }, | ||
| entry: &sandboxEntry{ | ||
| Kind: types.SandboxKind, | ||
| SessionID: "session-456", | ||
| }, | ||
| validate: func(t *testing.T, result *types.SandboxInfo) { | ||
| expected := now.Add(24 * time.Hour) | ||
| assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second) | ||
| }, | ||
| }, | ||
| { | ||
| name: "ShutdownTime set to 30m overrides DefaultSandboxTTL", | ||
| setupSandbox: func() *sandboxv1alpha1.Sandbox { | ||
| shutdownTime := now.Add(30 * time.Minute) | ||
| return &sandboxv1alpha1.Sandbox{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "short-sandbox", | ||
| Namespace: "default", | ||
| }, | ||
| Spec: sandboxv1alpha1.SandboxSpec{ | ||
| Lifecycle: sandboxv1alpha1.Lifecycle{ | ||
| ShutdownTime: &metav1.Time{Time: shutdownTime}, | ||
| }, | ||
| }, | ||
| } | ||
| }, | ||
| entry: &sandboxEntry{ | ||
| Kind: types.SandboxClaimsKind, | ||
| SessionID: "session-789", | ||
| }, | ||
| validate: func(t *testing.T, result *types.SandboxInfo) { | ||
| expected := now.Add(30 * time.Minute) | ||
| assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second) | ||
| // Must NOT be 8h (DefaultSandboxTTL) | ||
| assert.True(t, result.ExpiresAt.Before(now.Add(DefaultSandboxTTL)), | ||
| "ExpiresAt should be 30m, not the 8h default") | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| sandbox := tt.setupSandbox() | ||
| result := buildSandboxPlaceHolder(sandbox, tt.entry) | ||
| tt.validate(t, result) | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
The new table-driven tests validate buildSandboxPlaceHolder behavior when ShutdownTime is present/absent on the Sandbox object, but they don't cover the warm-pool CodeInterpreter flow where the placeholder Sandbox is constructed without Spec.Lifecycle.ShutdownTime. Adding a regression test for that warm-pool path (ensuring ShutdownTime/expiry reflects MaxSessionDuration) would help prevent this bug from recurring.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #276 +/- ##
==========================================
+ Coverage 35.60% 43.43% +7.82%
==========================================
Files 29 30 +1
Lines 2533 2618 +85
==========================================
+ Hits 902 1137 +235
+ Misses 1505 1358 -147
+ Partials 126 123 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- refactor buildSandboxPlaceHolder to use if/else instead of assign-then-overwrite, avoiding a redundant time.Now() call - set ShutdownTime on simpleSandbox in the warm-pool CodeInterpreter
path so MaxSessionDuration is honored in the GC expiry ZSet for
warm-pool sessions too
- switch assert.WithinDuration to assert.Equal for ShutdownTime-set test cases since those are exact values, not time.Now()-based
- add warm-pool regression test case to prevent the bug recurring
Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
|
hey @hzxuzhonghu , I have addressed a few of the Copilot suggestions. switched to if/else in buildSandboxPlaceHolder to skip the unnecessary time.Now() call when ShutdownTime is already set. also fixed the warm-pool path in workload_builder.go since simpleSandbox was being built without ShutdownTime, so MaxSessionDuration was still being ignored there even after the original fix. tightened the test assertions to assert.Equal for the exact-value cases and added a regression test for the warm-pool flow too. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hzxuzhonghu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
so basically
buildSandboxPlaceHolderwas always settingExpiresAttotime.Now().Add(DefaultSandboxTTL)which is a hardcoded 8h, no matter whatMaxSessionDurationthe user had set on their CodeInterpreter or AgentRuntime CR.the problem is
StoreSandboxwrites that expiry score into the Redis/Valkey ZSet right at placeholder time, andUpdateSandbox(called later when the sandbox is actually ready) only updates the JSON blob ; it doesn't touch the ZSet. so the GC always saw 8h, not whatever the user configured.the fix is pretty small , just read
Spec.Lifecycle.ShutdownTimefrom the sandbox CR inbuildSandboxPlaceHolderif it's set, and only fall back toDefaultSandboxTTLwhen it's nil. the sandbox CR already hasShutdownTimepopulated bybuildSandboxObjectbeforebuildSandboxPlaceHolderis called, so no extra plumbing needed. also added table-driven tests for this insandbox_helper_test.gocovering the three cases: no ShutdownTime (fallback), 24h duration, and 30m duration.Special notes for your reviewer:
the pattern I used in
buildSandboxPlaceHolderis the same one already inbuildSandboxInfo, so it should look familiar. no changes to the store interface or any other layer.Does this PR introduce a user-facing change?: