feat(nvidia): Vulkan vGPU support — memory partitioning via vulkan-layer + manifest auto-mount + admission webhook env injection by 100milliongold · Pull Request #1803 · Project-HAMi/HAMi

100milliongold · 2026-04-27T05:20:08Z

Summary

Adds Vulkan vGPU support to HAMi so that Vulkan workloads (Isaac Sim, ray tracing, GPU-accelerated rendering, etc.) honor the same per-container memory limit that HAMi already enforces for CUDA.

Three coordinated layers, all opt-in via the hami.io/vulkan: "true" pod annotation:

HAMi-core Vulkan layer — hooks vkAllocateMemory to enforce CUDA_DEVICE_MEMORY_LIMIT_0. The Vulkan implicit-layer manifest uses enable_environment: HAMI_VULKAN_ENABLE=1 so the layer is loaded only when the pod opts in.
Device-plugin — bind-mounts the Vulkan implicit-layer manifest from the host into the container at /etc/vulkan/implicit_layer.d/hami.json.
Admission webhook — when the pod has hami.io/vulkan: "true" and requests a GPU resource, injects HAMI_VULKAN_ENABLE=1 and merges graphics into NVIDIA_DRIVER_CAPABILITIES so the runtime exposes the NVIDIA Vulkan ICD.

Pods without the annotation are unaffected — bit-identical to current behavior.

Why

HAMi currently enforces vGPU memory limits only for CUDA workloads. Vulkan applications bypass the limit because vkAllocateMemory is not on the CUDA path. We hit this in production with Isaac Sim — Kit allocates memory through Vulkan, ignored the requested partition, and OOM'd the host. Hooking vkAllocateMemory in the HAMi-core layer closes the gap.

The opt-in annotation is intentional:

Avoids forcing the graphics capability on every CUDA pod (would change runtime behavior cluster-wide).
The enable_environment guard means the layer doesn't load even if the manifest happens to be mounted, when the env isn't set.
Existing CUDA workloads remain unchanged.

What changed

File / Area	Change
`libvgpu` submodule	→ `xiilab/HAMi-core@8d4f712` (Vulkan layer with `vkAllocateMemory` hook, `cuMemFree[Async]` untracked-pointer fallback, `cuMemGetInfo_v2` OptiX crash fix). A companion PR against `Project-HAMi/HAMi-core` is needed before this can be merged upstream — happy to open it once direction is agreed.
`docker/Dockerfile`	Install `libvulkan-dev` in the nvbuild stage; copy `hami.json` into the runtime image at `/k8s-vgpu/lib/nvidia/vulkan/implicit_layer.d/hami.json`
`pkg/device-plugin/nvidiadevice/nvinternal/plugin/server.go`	When `/usr/local/vgpu/vulkan/implicit_layer.d/hami.json` exists on the host, append a bind-mount into the container's Allocate response. Idempotent and side-effect-free when the file is absent.
`pkg/device/nvidia/device.go`	New `applyVulkanAnnotation`: when the pod carries `hami.io/vulkan: "true"`, sets `HAMI_VULKAN_ENABLE=1` and merges `graphics` into `NVIDIA_DRIVER_CAPABILITIES`. Called only when the container actually requests a GPU resource.
`pkg/device/nvidia/device_test.go`	TDD coverage for env injection: no-annotation no-op, capability merge, idempotency, edge cases (existing caps, empty caps, etc.).
`examples/nvidia/vulkan_example.yaml`	Minimal usage sample.
`docs/vulkan-vgpu-support.md`	English usage guide.
`docs/vulkan-vgpu-support_cn.md`	Chinese translation.
`docs/vulkan-vgpu-e2e-checklist.md`	Manual E2E verification checklist.

How it works

The HAMi-core Vulkan layer hooks vkAllocateMemory to enforce the per-container memory limit set by HAMi-core's existing CUDA limit code (same CUDA_DEVICE_MEMORY_LIMIT_0 env).
The device-plugin mounts the implicit-layer manifest into the container so the Vulkan loader picks it up automatically.
The manifest's enable_environment: HAMI_VULKAN_ENABLE=1 guard means the layer isn't activated unless the env is set.
The admission webhook reads hami.io/vulkan: "true", sets the gating env, and merges graphics so the NVIDIA runtime exposes the Vulkan ICD libraries.

Test plan

go test ./pkg/device/nvidia/... — env injection unit tests pass.
make docker builds the image with libvulkan-dev and ships hami.json.
Deploy pod with hami.io/vulkan: "true" annotation → HAMI_VULKAN_ENABLE=1 env present, NVIDIA_DRIVER_CAPABILITIES contains graphics, /etc/vulkan/implicit_layer.d/hami.json mounted.
Deploy pod without the annotation → unmodified (regression check).
E2E: ran a Vulkan workload (Isaac Sim) with nvidia.com/gpumem limit; the Kit boot log reports the exact partition size and the workload is held to it.

Verified on

Hardware: NVIDIA RTX 6000 Ada × 2 (driver 550-series).
K8s: v1.34.3.
Integration: tested with both stock HAMi (3-tier: webhook + scheduler + device-plugin) and a webhook-only deployment co-existing with Volcano scheduler. The Vulkan changes are orthogonal to scheduling — they only depend on the webhook + device-plugin path.

Compatibility / Breaking changes

None for existing CUDA workloads — the Vulkan code paths are gated behind the annotation and the enable_environment runtime guard.
New container env (HAMI_VULKAN_ENABLE) and new mount path (/etc/vulkan/implicit_layer.d/hami.json) are added only for opted-in pods.

Notes for reviewers

The submodule change to xiilab/HAMi-core@vulkan-layer is the only blocker for upstream merge; happy to open the companion PR against Project-HAMi/HAMi-core once direction is agreed (e.g. accept as a new branch, fold into a release, or restructure as a build flag).
This branch carries a few internal planning files under docs/superpowers/ (Korean-language design and implementation plans) that I can drop in a cleanup commit if reviewers prefer a leaner diff.
The webhook code at pkg/scheduler/webhook.go:64-69 (the existing scheduler-name skip check) has a known operator-precedence issue in v2.8.x that fix: Add option for overwrite schedulerName #1163 fixed on master — this PR does not touch that block.
The device-plugin change is intentionally null-safe: if the host doesn't have hami.json (e.g. the user opted out of running the manifest installer), the Allocate response is unchanged.

Happy to split into smaller PRs (HAMi-core layer / Dockerfile / device-plugin mount / webhook env / docs) if that's easier to review.

hami-robot · 2026-04-27T05:20:18Z

Welcome @100milliongold! It looks like this is your first PR to Project-HAMi/HAMi 🎉

Adds design spec for extending HAMi's NVIDIA vGPU partitioning to Vulkan workloads. CUDA and Vulkan share the existing nvidia.com/gpumem and nvidia.com/gpucores budgets, gated by the hami.io/vulkan annotation. Interception is implemented as a Vulkan implicit layer exposed by libvgpu.so (HAMi-core), sharing in-process counters with the existing CUDA hooks. Signed-off-by: Jea-Eok-Kim <je.kim@xiilab.com>