Upgrade vLLM to 0.12.0 and remove FlashInfer from Dockerfile #248

MohamedKHALILRouissi · 2025-12-09T19:42:17Z

This PR upgrades vLLM to version 0.12.0 and removes FlashInfer from the Dockerfile.

As part of the update, I removed all uses of get_model_config() from OpenAIServingModels, OpenAIServingChat, and OpenAIServingCompletion.
In vLLM 0.12.0, this method has been moved internally and replaced by the model_config attribute, so calling it manually causes issues with the newer version.

The updated Dockerfile has been fully rebuilt and deployed, and the DeepSeek OCR functionality is working correctly after these changes.

This PR also addresses the issues discussed in:
#247
#245

@TimPietruskyRunPod

and thanks for the serverless platform

…egrated internally

samuelexferri · 2025-12-10T08:54:28Z

Hi, thanks for the commit. I attempted to deploy to a new endpoint using your forked repository. While the build was successful, the process is hanging indefinitely at the testing phase without any logs. Have you tested this? Did you need to configure any specific environment variables?

samuelexferri · 2025-12-10T12:09:50Z

I was able to use the template successfully by making these changes:

Loudsrl@5a08923

MohamedKHALILRouissi · 2025-12-10T13:25:36Z

@samuelexferri i already deployed endpoint and working fine

for build use this step for baked model ( win time and money )
DOCKER_BUILDKIT=1 docker build -t yourdockerreportname/worker-vllm:0.12.2 --build-arg MODEL_NAME="deepseek-ai/DeepSeek-OCR" --build-arg BASE_PATH="/models" .

where MODEL_NAME is the target model to deploy ( this will take time in the build and push because the model already backed in the image )

then create new endpoint using docker template and use the env from the template vllm worker runpod

live usage of deepseek ocr:

samuelexferri · 2025-12-17T16:13:48Z

Thank you @MohamedKHALILRouissi for your response!

I tried using a baked-in model inside the Docker image, but I'm still experiencing the same 1m 19s startup time despite an execution time of only 272ms.

Maybe I'm doing something wrong. Here are my steps:

Build and push the Docker image

# 1. Build for amd64
DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t myhub/worker-vllm-hunyuan-ocr:0.12.0 --build-arg MODEL_NAME="tencent/HunyuanOCR" --build-arg BASE_PATH="/models" .

# 2. Login
docker login

# 3. Tag as latest
docker tag myhub/worker-vllm-hunyuan-ocr:0.12.0 myhub/worker-vllm-hunyuan-ocr:latest

# 4. Push
docker push --all-tags myhub/worker-vllm-hunyuan-ocr`

Create the RunPod template with no environment variables (I only set the container disk to 50GB; I am not using a network volume).
Deploy the serverless endpoint using that template with no env vars.

Thanks for your help!

upgrade vllm to 0.12.0 & fix model_config calls , as vllm already int…

ef7df95

…egrated internally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade vLLM to 0.12.0 and remove FlashInfer from Dockerfile #248

Upgrade vLLM to 0.12.0 and remove FlashInfer from Dockerfile #248

Uh oh!

MohamedKHALILRouissi commented Dec 9, 2025

Uh oh!

samuelexferri commented Dec 10, 2025

Uh oh!

samuelexferri commented Dec 10, 2025

Uh oh!

MohamedKHALILRouissi commented Dec 10, 2025 •

edited

Loading

Uh oh!

samuelexferri commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Upgrade vLLM to 0.12.0 and remove FlashInfer from Dockerfile #248

Are you sure you want to change the base?

Upgrade vLLM to 0.12.0 and remove FlashInfer from Dockerfile #248

Uh oh!

Conversation

MohamedKHALILRouissi commented Dec 9, 2025

Uh oh!

samuelexferri commented Dec 10, 2025

Uh oh!

samuelexferri commented Dec 10, 2025

Uh oh!

MohamedKHALILRouissi commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samuelexferri commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MohamedKHALILRouissi commented Dec 10, 2025 •

edited

Loading