Skip to content

fix: honor MaxSessionDuration in placeholder expiry ZSet score#276

Merged
volcano-sh-bot merged 2 commits intovolcano-sh:mainfrom
Sanchit2662:fix/max-session-duration-expiry-zset
Apr 16, 2026
Merged

fix: honor MaxSessionDuration in placeholder expiry ZSet score#276
volcano-sh-bot merged 2 commits intovolcano-sh:mainfrom
Sanchit2662:fix/max-session-duration-expiry-zset

Conversation

@Sanchit2662
Copy link
Copy Markdown
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

so basically buildSandboxPlaceHolder was always setting ExpiresAt to time.Now().Add(DefaultSandboxTTL) which is a hardcoded 8h, no matter what MaxSessionDuration the user had set on their CodeInterpreter or AgentRuntime CR.
the problem is StoreSandbox writes that expiry score into the Redis/Valkey ZSet right at placeholder time, and UpdateSandbox (called later when the sandbox is actually ready) only updates the JSON blob ; it doesn't touch the ZSet. so the GC always saw 8h, not whatever the user configured.

the fix is pretty small , just read Spec.Lifecycle.ShutdownTime from the sandbox CR in buildSandboxPlaceHolder if it's set, and only fall back to DefaultSandboxTTL when it's nil. the sandbox CR already has ShutdownTime populated by buildSandboxObject before buildSandboxPlaceHolder is called, so no extra plumbing needed. also added table-driven tests for this in sandbox_helper_test.go covering the three cases: no ShutdownTime (fallback), 24h duration, and 30m duration.

Special notes for your reviewer:

the pattern I used in buildSandboxPlaceHolder is the same one already in buildSandboxInfo , so it should look familiar. no changes to the store interface or any other layer.

Does this PR introduce a user-facing change?:

Fixed a bug where `MaxSessionDuration` set on CodeInterpreter or AgentRuntime was
being ignored by the GC. Sandboxes were always expiring at 8h regardless of what                                                                                   
was configured. Now the expiry ZSet score correctly reflects the user's configured                                                                                 
`MaxSessionDuration`.                               

Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
Copilot AI review requested due to automatic review settings April 15, 2026 13:04
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the sandbox placeholder creation logic to use the ShutdownTime from the sandbox specification if available, falling back to the default TTL otherwise. It also adds a table-driven test suite to cover these scenarios. The review feedback recommends refactoring the expiration logic to avoid redundant system clock calls and tightening the test assertions to use exact equality for better precision.

Comment thread pkg/workloadmanager/sandbox_helper.go Outdated
Comment on lines +30 to +33
expiresAt := time.Now().Add(DefaultSandboxTTL)
if sandboxCR.Spec.Lifecycle.ShutdownTime != nil {
expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The calculation of time.Now().Add(DefaultSandboxTTL) is redundant when sandboxCR.Spec.Lifecycle.ShutdownTime is present, as it is immediately overwritten. While the performance impact is minimal, refactoring to an if/else block improves clarity and avoids an unnecessary system clock call in the common case where a specific shutdown time is configured.

	var expiresAt time.Time
	if sandboxCR.Spec.Lifecycle.ShutdownTime != nil {
		expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time
	} else {
		expiresAt = time.Now().Add(DefaultSandboxTTL)
	}

},
validate: func(t *testing.T, result *types.SandboxInfo) {
expected := now.Add(24 * time.Hour)
assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In this test case, result.ExpiresAt is directly assigned from sandboxCR.Spec.Lifecycle.ShutdownTime.Time, which was initialized with expected. Using assert.WithinDuration with a 2-second margin is unnecessarily loose. Using assert.Equal would more precisely verify that the absolute time from the CR is honored exactly, ensuring that the logic isn't incorrectly falling back to a time.Now() based calculation.

Suggested change
assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second)
assert.Equal(t, expected, result.ExpiresAt)

},
validate: func(t *testing.T, result *types.SandboxInfo) {
expected := now.Add(30 * time.Minute)
assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the previous case, since result.ExpiresAt is expected to match the absolute time provided in the sandboxCR, consider using assert.Equal to verify exact precision. This confirms that the configured ShutdownTime is used directly without any unintended drift or recalculation.

Suggested change
assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second)
assert.Equal(t, expected, result.ExpiresAt)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes sandbox GC expiry by making the placeholder SandboxInfo.ExpiresAt honor the sandbox CR’s Spec.Lifecycle.ShutdownTime (which is derived from user-configured MaxSessionDuration) instead of always using the 8h default TTL.

Changes:

  • Update buildSandboxPlaceHolder to prefer sandboxCR.Spec.Lifecycle.ShutdownTime when present.
  • Add table-driven unit tests covering placeholder expiry fallback and override cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
pkg/workloadmanager/sandbox_helper.go Uses Spec.Lifecycle.ShutdownTime to compute placeholder ExpiresAt (affects GC ZSet score via StoreSandbox).
pkg/workloadmanager/sandbox_helper_test.go Adds tests validating placeholder expiry behavior with/without ShutdownTime.

Comment thread pkg/workloadmanager/sandbox_helper.go Outdated
Comment on lines +30 to +33
expiresAt := time.Now().Add(DefaultSandboxTTL)
if sandboxCR.Spec.Lifecycle.ShutdownTime != nil {
expiresAt = sandboxCR.Spec.Lifecycle.ShutdownTime.Time
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildSandboxPlaceHolder only honors Spec.Lifecycle.ShutdownTime if it is populated on the Sandbox object. In the warm-pool CodeInterpreter path (buildSandboxByCodeInterpreter when WarmPoolSize > 0), the code constructs a simpleSandbox without Spec.Lifecycle.ShutdownTime, so this function will still fall back to DefaultSandboxTTL and the GC expiry ZSet score will continue to ignore MaxSessionDuration for warm-pool sessions. Consider ensuring the warm-pool path populates ShutdownTime on the placeholder Sandbox (or pass the desired expiry/ttl into buildSandboxPlaceHolder) so the stored ZSet score reflects the configured duration in all flows.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +125
func TestBuildSandboxPlaceHolder_TableDriven(t *testing.T) {
now := time.Now()

tests := []struct {
name string
setupSandbox func() *sandboxv1alpha1.Sandbox
entry *sandboxEntry
validate func(t *testing.T, result *types.SandboxInfo)
}{
{
name: "no ShutdownTime falls back to DefaultSandboxTTL",
setupSandbox: func() *sandboxv1alpha1.Sandbox {
return &sandboxv1alpha1.Sandbox{
ObjectMeta: metav1.ObjectMeta{
Name: "test-sandbox",
Namespace: "default",
},
}
},
entry: &sandboxEntry{
Kind: types.SandboxKind,
SessionID: "session-123",
},
validate: func(t *testing.T, result *types.SandboxInfo) {
expected := now.Add(DefaultSandboxTTL)
assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second)
assert.Equal(t, "creating", result.Status)
assert.Equal(t, "session-123", result.SessionID)
},
},
{
name: "ShutdownTime set to 24h is used as ExpiresAt",
setupSandbox: func() *sandboxv1alpha1.Sandbox {
shutdownTime := now.Add(24 * time.Hour)
return &sandboxv1alpha1.Sandbox{
ObjectMeta: metav1.ObjectMeta{
Name: "test-sandbox",
Namespace: "default",
},
Spec: sandboxv1alpha1.SandboxSpec{
Lifecycle: sandboxv1alpha1.Lifecycle{
ShutdownTime: &metav1.Time{Time: shutdownTime},
},
},
}
},
entry: &sandboxEntry{
Kind: types.SandboxKind,
SessionID: "session-456",
},
validate: func(t *testing.T, result *types.SandboxInfo) {
expected := now.Add(24 * time.Hour)
assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second)
},
},
{
name: "ShutdownTime set to 30m overrides DefaultSandboxTTL",
setupSandbox: func() *sandboxv1alpha1.Sandbox {
shutdownTime := now.Add(30 * time.Minute)
return &sandboxv1alpha1.Sandbox{
ObjectMeta: metav1.ObjectMeta{
Name: "short-sandbox",
Namespace: "default",
},
Spec: sandboxv1alpha1.SandboxSpec{
Lifecycle: sandboxv1alpha1.Lifecycle{
ShutdownTime: &metav1.Time{Time: shutdownTime},
},
},
}
},
entry: &sandboxEntry{
Kind: types.SandboxClaimsKind,
SessionID: "session-789",
},
validate: func(t *testing.T, result *types.SandboxInfo) {
expected := now.Add(30 * time.Minute)
assert.WithinDuration(t, expected, result.ExpiresAt, 2*time.Second)
// Must NOT be 8h (DefaultSandboxTTL)
assert.True(t, result.ExpiresAt.Before(now.Add(DefaultSandboxTTL)),
"ExpiresAt should be 30m, not the 8h default")
},
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
sandbox := tt.setupSandbox()
result := buildSandboxPlaceHolder(sandbox, tt.entry)
tt.validate(t, result)
})
}
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new table-driven tests validate buildSandboxPlaceHolder behavior when ShutdownTime is present/absent on the Sandbox object, but they don't cover the warm-pool CodeInterpreter flow where the placeholder Sandbox is constructed without Spec.Lifecycle.ShutdownTime. Adding a regression test for that warm-pool path (ensuring ShutdownTime/expiry reflects MaxSessionDuration) would help prevent this bug from recurring.

Copilot generated this review using guidance from repository custom instructions.
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 15, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 66.66667% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.43%. Comparing base (845b798) to head (5a641f8).
⚠️ Report is 164 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloadmanager/workload_builder.go 0.00% 3 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #276      +/-   ##
==========================================
+ Coverage   35.60%   43.43%   +7.82%     
==========================================
  Files          29       30       +1     
  Lines        2533     2618      +85     
==========================================
+ Hits          902     1137     +235     
+ Misses       1505     1358     -147     
+ Partials      126      123       -3     
Flag Coverage Δ
unittests 43.43% <66.66%> (+7.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

  - refactor buildSandboxPlaceHolder to use if/else instead of                                                                                                           assign-then-overwrite, avoiding a redundant time.Now() call                                                                                                        - set ShutdownTime on simpleSandbox in the warm-pool CodeInterpreter
    path so MaxSessionDuration is honored in the GC expiry ZSet for
    warm-pool sessions too
  - switch assert.WithinDuration to assert.Equal for ShutdownTime-set                                                                                                    test cases since those are exact values, not time.Now()-based
  - add warm-pool regression test case to prevent the bug recurring

Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
@Sanchit2662
Copy link
Copy Markdown
Contributor Author

hey @hzxuzhonghu , I have addressed a few of the Copilot suggestions. switched to if/else in buildSandboxPlaceHolder to skip the unnecessary time.Now() call when ShutdownTime is already set. also fixed the warm-pool path in workload_builder.go since simpleSandbox was being built without ShutdownTime, so MaxSessionDuration was still being ignored there even after the original fix. tightened the test assertions to assert.Equal for the exact-value cases and added a regression test for the warm-pool flow too.

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot merged commit ef2ee0f into volcano-sh:main Apr 16, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants