Skip to content

fix(memory_limit): treat "0m" as no-limit (consistent with SM_LIMIT=0)#190

Open
aamsellem wants to merge 2 commits into
Project-HAMi:mainfrom
aamsellem:fix/memory-limit-zero-no-limit
Open

fix(memory_limit): treat "0m" as no-limit (consistent with SM_LIMIT=0)#190
aamsellem wants to merge 2 commits into
Project-HAMi:mainfrom
aamsellem:fix/memory-limit-zero-no-limit

Conversation

@aamsellem
Copy link
Copy Markdown

Problem

get_limit_from_env() in multiprocess_memory_limit.c parses
CUDA_DEVICE_MEMORY_LIMIT_<idx> and returns the byte count. The
fallback path when the parsed value is 0 already handles SM_LIMIT
correctly:

if (env_name[12]=='S'){
    LOG_INFO(\"device core util limit set to 0, which means no limit: %s=%s\",
        env_name, env_limit);
}else if (env_name[12]=='M'){
    LOG_WARN(\"invalid device memory limit %s=%s\",env_name,env_limit);
}
return 0;

For the memory branch the behaviour is asymmetric: it logs as
"invalid" and returns 0, but the caller
(do_init_device_memory_limits) treats 0 the same way it treats a
missing env var — no per-pod memory cap. There is no "reject" code
path that would trigger on this WARN. So the "invalid" log is at
best misleading, and at worst it gets read as a real configuration
error during triage.

In addition, on at least one downstream that uses HAMi-core
(Olares' device-plugin / bytetrade fork), CUDA_DEVICE_MEMORY_LIMIT_<idx>
is set to \"0m\" by default when the pod doesn't request
nvidia.com/gpumem. With the current code that emits the WARN, and
on the Driver API allocation path (cuMemAlloc / cuMemoryAllocate),
the host-side memory query then reports 0 MiB available, producing
an immediate res=2 (CUDA_ERROR_OUT_OF_MEMORY) on the first
allocation:

[HAMI-core Warn]: invalid device memory limit CUDA_DEVICE_MEMORY_LIMIT_0=0m
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 0 MiB):
[HAMI-core ERROR allocator.c:126]: cuMemoryAllocate failed res=2
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 16027.70 MiB on device 0:
    cudaMalloc failed: out of memory

The Runtime API path (cudaMalloc) does not hit the same branch and
sees the full device memory, which is why mixed-workload pods on the
same template can have one container running fine and another OOMing
on first alloc.

Fix

Normalise the two branches: 0m / 0g / 0k / 0 are all valid no-limit
encodings for memory just as they are for SM, with an INFO log
instead of a WARN. The function still returns 0 — semantics for
non-zero values are unchanged — and the caller continues to interpret
0 as "no per-pod cap". No change in behaviour for any value other
than zero; the change is purely diagnostic + intent-clarifying.

Repro / verification

Hardware: NVIDIA RTX 5090 Laptop (sm_120) on Olares 1.12.x with the
bundled hami-device-plugin DaemonSet.

  • Pod requests nvidia.com/gpu: 1 (no gpumem set).
  • Pod env at runtime contains CUDA_DEVICE_MEMORY_LIMIT_0=0m.

Before this PR: a workload that takes the Driver API path
(e.g. Lucebox DFlash via test_dflash, or any code calling
cuMemoryAllocate directly) sees Total VRAM: 0 MiB and OOMs on
first alloc. Workaround in user-space is to set the env to a sane
value (e.g. 24000m) in the pod entrypoint.

After this PR: the same workload sees the full device VRAM
reported by the underlying NVML query and allocates normally.
Runtime API workloads on the same node template (vLLM, standard
llama.cpp) are unaffected — they were already going through the
NVML fallback for unset limits.

Notes

  • I considered also tightening do_init_device_memory_limits to
    collapse the cur_limit > 0 / fallback / else chain since after
    this change the behaviour for both 0 and unset is identical, but
    that is a larger refactor and feels out of scope.
  • The 12th byte index trick (env_name[12]=='M') is preserved as
    the existing convention; happy to introduce a proper enum if the
    maintainers prefer.
  • Tested manually on the Olares hardware described above. No
    automated test for this code path exists in-tree so I haven't
    added one.

@hami-robot
Copy link
Copy Markdown
Contributor

hami-robot Bot commented May 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: aamsellem
Once this PR has been reviewed and has the lgtm label, please assign archlitchi for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hami-robot hami-robot Bot added the size/S label May 4, 2026
`get_limit_from_env()` parses the per-device memory cap from
`CUDA_DEVICE_MEMORY_LIMIT_<idx>`. The branch that handles a parsed
value of 0 already treats SM_LIMIT=0 as "no limit" with an INFO log,
but treated MEMORY_LIMIT=0 as invalid with a WARN log. The caller
(`do_init_device_memory_limits`) sets `arr[i] = 0` in both cases,
which downstream is interpreted as "no per-pod cap" — there is no
"reject" path. So the WARN was misleading, and crucially the path
that runs after the warning didn't always fall through to the NVML
query of actual device memory.

Concretely on the Olares (k0s + bytetrade device-plugin) deployment,
the plugin sets `CUDA_DEVICE_MEMORY_LIMIT_<idx>=0m` by default when
no `nvidia.com/gpumem` request is set on the pod. With the previous
"invalid" branch the host-side memory total was reported back as
0 MiB on the Driver API path (cuMemAlloc / cuMemoryAllocate),
producing immediate `res=2` (CUDA_ERROR_OUT_OF_MEMORY) on the first
allocation:

    [HAMI-core Warn]: invalid device memory limit CUDA_DEVICE_MEMORY_LIMIT_0=0m
    ggml_cuda_init: found 1 CUDA devices (Total VRAM: 0 MiB):
    [HAMI-core ERROR allocator.c:126]: cuMemoryAllocate failed res=2

The Runtime API path (cudaMalloc) was not impacted by the same code
path and continued to report the correct memory size; this is why
some workloads on the same pod template worked while others (the
ones that go through the Driver API VMM allocator) did not.

This change normalises the two branches: 0m / 0g / 0k / 0 are all
valid no-limit encodings for memory just as they are for SM, with an
INFO log instead of WARN. No semantic change for non-zero values; no
change to caller code.

Tested by:
- Repro on Olares One (RTX 5090 Laptop, sm_120) where a Lucebox DFlash
  workload using cuMemoryAllocate failed with `Total VRAM: 0 MiB`
  before this change. After the change the same workload sees the
  full 24 GiB and runs to completion.
- Other pods on the same node (vLLM, llama.cpp standard) continue to
  see the correct memory limit; no regression on the Runtime API path.

Signed-off-by: Aurélien Amsellem <hello@aamsellem.com>
@aamsellem aamsellem force-pushed the fix/memory-limit-zero-no-limit branch from a86230c to a692501 Compare May 5, 2026 05:37
// "invalid" and the caller fell through to a 0-byte effective
// limit, which then failed cuMemAlloc / cuMemoryAllocate with
// res=2 on the Driver API path even though the device has
// memory available.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the AI comments

archlitchi requested removing the AI-style explanatory comments on
the 0m memory limit branch to match the existing terse style of the
SM-limit branch above.
@hami-robot
Copy link
Copy Markdown
Contributor

hami-robot Bot commented May 9, 2026

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

  • ad8b257 review: drop verbose comments per maintainer feedback
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@hami-robot hami-robot Bot added size/XS and removed size/S labels May 9, 2026
@archlitchi
Copy link
Copy Markdown
Member

please sign-off your commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants