Skip to content

Cuda out of Memory #452

@ck-amrahd

Description

@ck-amrahd

Hi guys,
I am trying to run Qwen 3B Instruct model on a GPU with 24 GB VRAM, but when the VLLM is creating CUDA graphs, it goes out of memory. It seems like we can set the gpu_memory_utilization config of VLLM to be around 0.7 to free up the GPU memory. Is there a way to pass this flag during backend initialization? Another interesting issue is that this happens when I run it on Databricks with a GPU of 24 GB VRAM, but when I run it on a local machine with RTX 3090, it runs fine. Not sure what the cause is. Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions