Skip to content

Conversation

@MohamedKHALILRouissi
Copy link

This PR upgrades vLLM to version 0.12.0 and removes FlashInfer from the Dockerfile.

As part of the update, I removed all uses of get_model_config() from OpenAIServingModels, OpenAIServingChat, and OpenAIServingCompletion.
In vLLM 0.12.0, this method has been moved internally and replaced by the model_config attribute, so calling it manually causes issues with the newer version.

The updated Dockerfile has been fully rebuilt and deployed, and the DeepSeek OCR functionality is working correctly after these changes.

This PR also addresses the issues discussed in:
#247
#245

@TimPietruskyRunPod

and thanks for the serverless platform

image

@samuelexferri
Copy link

Hi, thanks for the commit. I attempted to deploy to a new endpoint using your forked repository. While the build was successful, the process is hanging indefinitely at the testing phase without any logs. Have you tested this? Did you need to configure any specific environment variables?

@samuelexferri
Copy link

I was able to use the template successfully by making these changes:

Loudsrl@5a08923

@MohamedKHALILRouissi
Copy link
Author

MohamedKHALILRouissi commented Dec 10, 2025

@samuelexferri i already deployed endpoint and working fine

for build use this step for baked model ( win time and money )
DOCKER_BUILDKIT=1 docker build -t yourdockerreportname/worker-vllm:0.12.2 --build-arg MODEL_NAME="deepseek-ai/DeepSeek-OCR" --build-arg BASE_PATH="/models" .

where MODEL_NAME is the target model to deploy ( this will take time in the build and push because the model already backed in the image )

then create new endpoint using docker template and use the env from the template vllm worker runpod

live usage of deepseek ocr:
image

@samuelexferri
Copy link

Thank you @MohamedKHALILRouissi for your response!

I tried using a baked-in model inside the Docker image, but I'm still experiencing the same 1m 19s startup time despite an execution time of only 272ms.

Maybe I'm doing something wrong. Here are my steps:

  1. Build and push the Docker image
# 1. Build for amd64
DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t myhub/worker-vllm-hunyuan-ocr:0.12.0 --build-arg MODEL_NAME="tencent/HunyuanOCR" --build-arg BASE_PATH="/models" .

# 2. Login
docker login

# 3. Tag as latest
docker tag myhub/worker-vllm-hunyuan-ocr:0.12.0 myhub/worker-vllm-hunyuan-ocr:latest

# 4. Push
docker push --all-tags myhub/worker-vllm-hunyuan-ocr`
  1. Create the RunPod template with no environment variables (I only set the container disk to 50GB; I am not using a network volume).

  2. Deploy the serverless endpoint using that template with no env vars.

Thanks for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants