MiMo-VL-7B-RL Quick API Script Guide

The MiMoVLM-api_server.py script in this project is designed for rapid deployment and invocation of Xiaomi's open-source multi-modal Vision-Language Model (VLM) — MiMo-VL-7B-RL. It supports image captioning and other multi-modal reasoning tasks.

Model Overview

MiMo-VL-7B-RL is a high-performance vision-language model released by Xiaomi's large model team. It features powerful image understanding, reasoning, and multi-modal dialogue capabilities. The model utilizes a native-resolution ViT encoder, MLP projector, and MiMo-7B language model, and is optimized through multi-stage pre-training and mixed reinforcement learning, achieving state-of-the-art results on several public benchmarks.

Model Homepage & Technical Report: Xiaomi MiMo-VL GitHub
Model Weights Download: ModelScope Download Link

Main Features

Supports image captioning via image URL or local file upload
Customizable prompt (instruction) support
Standard RESTful API interface for easy integration
Automatic management of temporary files, suitable for high concurrency

Quick Start

Prepare Model Weights
- Download the model weights from ModelScope Download Page and extract them to the directory specified by MODEL_PATH in MiMoVLM-api_server.py (default: /hy-tmp/data/MiMo-VL-7B-RL).
Install Dependencies
```
pip install -r requirements.txt
```
Start the API Service
```
python MiMoVLM-api_server.py
```
The service will listen on http://0.0.0.0:8000 after startup.

API Endpoints

1. Image URL Captioning

Endpoint: POST /describe_url/

Request Body:

{
  "image_url": "URL of the image",
  "prompt_text": "(Optional) Custom prompt"
}

Response Example:

{
  "description": "Image caption",
  "prompt_used": "Prompt actually used",
  "error": null
}

2. Image File Upload Captioning

Endpoint: POST /describe_upload/
Request Body:
- image: The uploaded image file (form-data)
- prompt_text: (Optional) Custom prompt
Response: Same as above

Dependencies

See requirements.txt for details.

Acknowledgement

This project is based on the open-source MiMo-VL project by Xiaomi Large Model Team. Special thanks!

For more technical details, please refer to the MiMo-VL Technical Report.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
uploaded-images		uploaded-images
LICENSE		LICENSE
MiMoVLM-api_server.py		MiMoVLM-api_server.py
MiMoVLM-call-local.py		MiMoVLM-call-local.py
README-cn.md		README-cn.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiMo-VL-7B-RL Quick API Script Guide

Model Overview

Main Features

Quick Start

API Endpoints

1. Image URL Captioning

2. Image File Upload Captioning

Dependencies

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MiMo-VL-7B-RL Quick API Script Guide

Model Overview

Main Features

Quick Start

API Endpoints

1. Image URL Captioning

2. Image File Upload Captioning

Dependencies

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages