fix(memory_limit): treat "0m" as no-limit (consistent with SM_LIMIT=0) by aamsellem · Pull Request #190 · Project-HAMi/HAMi-core

aamsellem · 2026-05-04T21:01:51Z

Problem

get_limit_from_env() in multiprocess_memory_limit.c parses
CUDA_DEVICE_MEMORY_LIMIT_<idx> and returns the byte count. The
fallback path when the parsed value is 0 already handles SM_LIMIT
correctly:

if (env_name[12]=='S'){
    LOG_INFO(\"device core util limit set to 0, which means no limit: %s=%s\",
        env_name, env_limit);
}else if (env_name[12]=='M'){
    LOG_WARN(\"invalid device memory limit %s=%s\",env_name,env_limit);
}
return 0;

For the memory branch the behaviour is asymmetric: it logs as
"invalid" and returns 0, but the caller
(do_init_device_memory_limits) treats 0 the same way it treats a
missing env var — no per-pod memory cap. There is no "reject" code
path that would trigger on this WARN. So the "invalid" log is at
best misleading, and at worst it gets read as a real configuration
error during triage.

In addition, on at least one downstream that uses HAMi-core
(Olares' device-plugin / bytetrade fork), CUDA_DEVICE_MEMORY_LIMIT_<idx>
is set to \"0m\" by default when the pod doesn't request
nvidia.com/gpumem. With the current code that emits the WARN, and
on the Driver API allocation path (cuMemAlloc / cuMemoryAllocate),
the host-side memory query then reports 0 MiB available, producing
an immediate res=2 (CUDA_ERROR_OUT_OF_MEMORY) on the first
allocation:

[HAMI-core Warn]: invalid device memory limit CUDA_DEVICE_MEMORY_LIMIT_0=0m
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 0 MiB):
[HAMI-core ERROR allocator.c:126]: cuMemoryAllocate failed res=2
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16027.70 MiB on device 0:
    cudaMalloc failed: out of memory

The Runtime API path (cudaMalloc) does not hit the same branch and
sees the full device memory, which is why mixed-workload pods on the
same template can have one container running fine and another OOMing
on first alloc.

Fix

Normalise the two branches: 0m / 0g / 0k / 0 are all valid no-limit
encodings for memory just as they are for SM, with an INFO log
instead of a WARN. The function still returns 0 — semantics for
non-zero values are unchanged — and the caller continues to interpret
0 as "no per-pod cap". No change in behaviour for any value other
than zero; the change is purely diagnostic + intent-clarifying.

Repro / verification

Hardware: NVIDIA RTX 5090 Laptop (sm_120) on Olares 1.12.x with the
bundled hami-device-plugin DaemonSet.

Pod requests nvidia.com/gpu: 1 (no gpumem set).
Pod env at runtime contains CUDA_DEVICE_MEMORY_LIMIT_0=0m.

Before this PR: a workload that takes the Driver API path
(e.g. Lucebox DFlash via test_dflash, or any code calling
cuMemoryAllocate directly) sees Total VRAM: 0 MiB and OOMs on
first alloc. Workaround in user-space is to set the env to a sane
value (e.g. 24000m) in the pod entrypoint.

After this PR: the same workload sees the full device VRAM
reported by the underlying NVML query and allocates normally.
Runtime API workloads on the same node template (vLLM, standard
llama.cpp) are unaffected — they were already going through the
NVML fallback for unset limits.

Notes

I considered also tightening do_init_device_memory_limits to
collapse the cur_limit > 0 / fallback / else chain since after
this change the behaviour for both 0 and unset is identical, but
that is a larger refactor and feels out of scope.
The 12th byte index trick (env_name[12]=='M') is preserved as
the existing convention; happy to introduce a proper enum if the
maintainers prefer.
Tested manually on the Olares hardware described above. No
automated test for this code path exists in-tree so I haven't
added one.

hami-robot · 2026-05-04T21:01:56Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aamsellem
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

`get_limit_from_env()` parses the per-device memory cap from `CUDA_DEVICE_MEMORY_LIMIT_<idx>`. The branch that handles a parsed value of 0 already treats SM_LIMIT=0 as "no limit" with an INFO log, but treated MEMORY_LIMIT=0 as invalid with a WARN log. The caller (`do_init_device_memory_limits`) sets `arr[i] = 0` in both cases, which downstream is interpreted as "no per-pod cap" — there is no "reject" path. So the WARN was misleading, and crucially the path that runs after the warning didn't always fall through to the NVML query of actual device memory. Concretely on the Olares (k0s + bytetrade device-plugin) deployment, the plugin sets `CUDA_DEVICE_MEMORY_LIMIT_<idx>=0m` by default when no `nvidia.com/gpumem` request is set on the pod. With the previous "invalid" branch the host-side memory total was reported back as 0 MiB on the Driver API path (cuMemAlloc / cuMemoryAllocate), producing immediate `res=2` (CUDA_ERROR_OUT_OF_MEMORY) on the first allocation: [HAMI-core Warn]: invalid device memory limit CUDA_DEVICE_MEMORY_LIMIT_0=0m ggml_cuda_init: found 1 CUDA devices (Total VRAM: 0 MiB): [HAMI-core ERROR allocator.c:126]: cuMemoryAllocate failed res=2 The Runtime API path (cudaMalloc) was not impacted by the same code path and continued to report the correct memory size; this is why some workloads on the same pod template worked while others (the ones that go through the Driver API VMM allocator) did not. This change normalises the two branches: 0m / 0g / 0k / 0 are all valid no-limit encodings for memory just as they are for SM, with an INFO log instead of WARN. No semantic change for non-zero values; no change to caller code. Tested by: - Repro on Olares One (RTX 5090 Laptop, sm_120) where a Lucebox DFlash workload using cuMemoryAllocate failed with `Total VRAM: 0 MiB` before this change. After the change the same workload sees the full 24 GiB and runs to completion. - Other pods on the same node (vLLM, llama.cpp standard) continue to see the correct memory limit; no regression on the Runtime API path. Signed-off-by: Aurélien Amsellem <hello@aamsellem.com>

archlitchi · 2026-05-09T06:15:46Z

+            // "invalid" and the caller fell through to a 0-byte effective
+            // limit, which then failed cuMemAlloc / cuMemoryAllocate with
+            // res=2 on the Driver API path even though the device has
+            // memory available.


please remove the AI comments

archlitchi requested removing the AI-style explanatory comments on the 0m memory limit branch to match the existing terse style of the SM-limit branch above.

hami-robot · 2026-05-09T06:26:09Z

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

ad8b257 review: drop verbose comments per maintainer feedback

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

archlitchi · 2026-05-09T07:14:15Z

please sign-off your commit

hami-robot Bot added the dco-signoff: no label May 4, 2026

hami-robot Bot requested review from archlitchi and chaunceyjiang May 4, 2026 21:01

hami-robot Bot added the size/S label May 4, 2026

aamsellem force-pushed the fix/memory-limit-zero-no-limit branch from a86230c to a692501 Compare May 5, 2026 05:37

hami-robot Bot added dco-signoff: yes and removed dco-signoff: no labels May 5, 2026

archlitchi reviewed May 9, 2026

View reviewed changes

review: drop verbose comments per maintainer feedback

ad8b257

archlitchi requested removing the AI-style explanatory comments on the 0m memory limit branch to match the existing terse style of the SM-limit branch above.

hami-robot Bot added dco-signoff: no and removed dco-signoff: yes labels May 9, 2026

hami-robot Bot added size/XS and removed size/S labels May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(memory_limit): treat "0m" as no-limit (consistent with SM_LIMIT=0)#190

fix(memory_limit): treat "0m" as no-limit (consistent with SM_LIMIT=0)#190
aamsellem wants to merge 2 commits into
Project-HAMi:mainfrom
aamsellem:fix/memory-limit-zero-no-limit

aamsellem commented May 4, 2026

Uh oh!

hami-robot Bot commented May 4, 2026

Uh oh!

archlitchi May 9, 2026

Uh oh!

hami-robot Bot commented May 9, 2026

Uh oh!

archlitchi commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aamsellem commented May 4, 2026

Problem

Fix

Repro / verification

Notes

Uh oh!

hami-robot Bot commented May 4, 2026

Uh oh!

archlitchi May 9, 2026

Choose a reason for hiding this comment

Uh oh!

hami-robot Bot commented May 9, 2026

Uh oh!

archlitchi commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants