Skip to content

feat: add MiniMax as LLM provider in caption annotation pipeline#33

Open
octo-patch wants to merge 1 commit intoNJU-3DV:mainfrom
octo-patch:feature/add-minimax-provider
Open

feat: add MiniMax as LLM provider in caption annotation pipeline#33
octo-patch wants to merge 1 commit intoNJU-3DV:mainfrom
octo-patch:feature/add-minimax-provider

Conversation

@octo-patch
Copy link
Copy Markdown

Summary

This PR adds MiniMax as a supported LLM provider in the SpatialVID caption annotation pipeline.

Changes

  • caption/utils/api_call.py: Detect api.minimax.io in base_domain and route to MiniMax's OpenAI-compatible endpoint (/v1/chat/completions). Temperature is kept at 0.1, within MiniMax's required (0.0, 1.0] range. The duplicated response-extraction branches are unified into a single return path.
  • caption/README.md: Add a Supported LLM Providers table listing MiniMax, Qwen, and Gemini with their --base_domain values, plus concrete MiniMax usage examples for LLM/inference.py and tagging/inference.py.
  • caption/tests/test_api_call.py: 12 unit tests (mock-based) + 3 integration tests against the live MiniMax API, covering endpoint routing, temperature constraints, auth headers, multimodal list content, and backward compatibility with Qwen/Gemini.

Usage

# LLM Captioning with MiniMax M2.7
python caption/LLM/inference.py \
  --csv_path path/to/data.csv \
  --pose_load_dir path/to/poses \
  --output_dir path/to/output \
  --model MiniMax-M2.7 \
  --api_key $MINIMAX_API_KEY \
  --base_domain https://api.minimax.io/v1/

MiniMax models supported:

  • MiniMax-M2.7 — Peak Performance. Ultimate Value. Master the Complex (204K context)
  • MiniMax-M2.7-highspeed — Same performance, faster and more agile (204K context)

API keys: platform.minimax.io
API reference: https://platform.minimax.io/docs/api-reference/text-openai-api

- Add MiniMax provider detection (api.minimax.io) in caption/utils/api_call.py
  with proper OpenAI-compatible endpoint routing
- Temperature is set to 0.1, within MiniMax's required (0.0, 1.0] range
- Unify response extraction path (both branches used identical logic)
- Add Supported LLM Providers table and MiniMax usage examples in caption/README.md
- Add caption/tests/test_api_call.py: 12 unit tests + 3 integration tests
  covering MiniMax endpoint, temperature, auth header, multimodal content,
  and backward compatibility for Qwen/Gemini providers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant