Skip to content

Fix SM utilization reporting for forked CUDA worker processes#199

Open
lyquid617 wants to merge 2 commits into
Project-HAMi:mainfrom
lyquid617:fix-util-init
Open

Fix SM utilization reporting for forked CUDA worker processes#199
lyquid617 wants to merge 2 commits into
Project-HAMi:mainfrom
lyquid617:fix-util-init

Conversation

@lyquid617
Copy link
Copy Markdown

Problem

For Python multiprocessing workloads using HAMi-core, Device_utilization_desc_of_container may stay at 0 even when GPU processes are actively running.

In the affected workload:

  • nvidia-smi pmon inside the container reports ~97% SM utilization per worker process.
  • nvmlDeviceGetProcessUtilization inside the container returns correct per-process samples.
  • vGPU_device_memory_usage_in_bytes is updated correctly.
  • Device_utilization_desc_of_container remains 0.
  • The shared cache has valid memory usage and updated last_kernel_time, but device_util[].sm_util stays 0.

Root Cause

postInit() starts the utilization watcher via init_utilization_watcher(), but it is guarded by pthread_once(post_cuinit_flag) and is only reliably triggered through the
cuInit() wrapper.

For forked Python multiprocessing workers, the child process may inherit a completed post_cuinit_flag from the parent. Later CUDA kernel launches in the worker call the hooked launch path, but postInit() is skipped, so host PID detection and utilization watcher initialization never happen in the worker process.

There is also an existing multi-GPU bug in init_gpu_device_utilization(): the inner loop breaks after device 0, so only the first device is reset. See#148

Fix

  • Add ensure_post_init() and call it from kernel launch wrappers.
  • Reset post_cuinit_flag and pidfound in the child process after fork.
  • Remove the erroneous break in init_gpu_device_utilization().

Validation

Tested with an 8-GPU Python multiprocessing workload:

Before:

  • nvidia-smi pmon: ~97% SM utilization
  • Device_utilization_desc_of_container: 0

After:

  • Device_utilization_desc_of_container: 97-98 per GPU
  • vGPU memory metrics remain correct

@hami-robot
Copy link
Copy Markdown
Contributor

hami-robot Bot commented May 29, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lyquid617
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot
Copy link
Copy Markdown
Contributor

hami-robot Bot commented May 29, 2026

Welcome @lyquid617! It looks like this is your first PR to Project-HAMi/HAMi-core 🎉

@hami-robot hami-robot Bot added the size/S label May 29, 2026
@archlitchi
Copy link
Copy Markdown
Member

thanks for the fix, please sign-off your commit

Signed-off-by: lyquid <l000064@lassoquant.com>
Signed-off-by: lyquid <l000064@lassoquant.com>
@lyquid617
Copy link
Copy Markdown
Author

thanks for the fix, please sign-off your commit

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants