Skip to content

fix: prevent unsigned integer underflow in NVML memory info hook#201

Open
miaobyte wants to merge 1 commit into
Project-HAMi:mainfrom
array2d:fix/nvml-memory-underflow
Open

fix: prevent unsigned integer underflow in NVML memory info hook#201
miaobyte wants to merge 1 commit into
Project-HAMi:mainfrom
array2d:fix/nvml-memory-underflow

Conversation

@miaobyte
Copy link
Copy Markdown

@miaobyte miaobyte commented Jun 4, 2026

Summary

Fix unsigned integer underflow in _nvmlDeviceGetMemoryInfo: when GPU memory usage exceeds the configured limit, (limit - usage) with unsigned size_t wraps to ~18 EB, breaking monitoring tools.

Changes

src/nvml/hook.c: compute clamped = (usage > limit) ? limit : usage, then derive both used = clamped and free = limit - clamped. This guarantees free + used = total always holds and free never underflows.

Fixes #200

🤖 Generated with Claude Code

@hami-robot hami-robot Bot requested a review from archlitchi June 4, 2026 08:16
@hami-robot
Copy link
Copy Markdown
Contributor

hami-robot Bot commented Jun 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: miaobyte
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot hami-robot Bot requested a review from chaunceyjiang June 4, 2026 08:16
@hami-robot hami-robot Bot added the size/XS label Jun 4, 2026
@miaobyte miaobyte force-pushed the fix/nvml-memory-underflow branch from 34a7e14 to 4c26dce Compare June 4, 2026 08:44
@miaobyte miaobyte force-pushed the fix/nvml-memory-underflow branch from 4c26dce to 58d0f3c Compare June 4, 2026 08:48
@hami-robot hami-robot Bot added size/L and removed size/XS labels Jun 4, 2026
When usage > limit, (limit - usage) wraps around to a massive unsigned
value (~18 EB). Clamp both free and used so that free + used = total
always holds, preventing corrupted values in monitoring tools.

Fixes Project-HAMi#200
Signed-off-by: peng.li24 <peng.li24@nio.com>

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@miaobyte miaobyte force-pushed the fix/nvml-memory-underflow branch from 58d0f3c to 668ecb8 Compare June 4, 2026 08:49
@hami-robot hami-robot Bot added size/XS and removed size/L labels Jun 4, 2026
@miaobyte miaobyte changed the title fix: prevent unsigned integer underflow when usage exceeds memory limit fix: prevent unsigned integer underflow in NVML memory info hook Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: NVML/CUDA memory hooks report corrupted values when usage exceeds limit (integer underflow)

1 participant