Skip to content

fix: resolve false "plan limit reached" error for cloud models#1251

Open
Dang-Ver wants to merge 2 commits intoeigent-ai:mainfrom
Dang-Ver:fix/false-plan-limit-error
Open

fix: resolve false "plan limit reached" error for cloud models#1251
Dang-Ver wants to merge 2 commits intoeigent-ai:mainfrom
Dang-Ver:fix/false-plan-limit-error

Conversation

@Dang-Ver
Copy link

Summary

  • Fix incorrect model_platform mapping for cloud models routed through LiteLLM proxy — non-GPT models (Gemini, Claude, Minimax) were using native provider SDKs (gemini, anthropic) instead of openai-compatible-model, causing protocol mismatches that were misinterpreted as budget errors
  • Remove overly broad " 429" heuristic in error_format.py that falsely classified HTTP 429 rate limits as insufficient_quota errors, and add a separate rate_limit_exceeded handler
  • Add missing cloud models (GPT-5/5.1/5.2/5-mini, Gemini 3 Flash, Claude Sonnet 4-5) to backend ModelType enum

Test plan

  • Select Minimax M2.5 as cloud model and send a message — should no longer show "plan limit reached"
  • Select Gemini 3 Pro as cloud model and send a message — should work without false quota error
  • Select GPT 5.2 as cloud model and send a message — should work as before
  • Select GPT 4.1 as cloud model and send a message — verify no regression
  • Trigger an actual rate limit (429) and verify the new "Rate limit exceeded" message appears instead of "exceeded your quota"

@Dang-Ver Dang-Ver force-pushed the fix/false-plan-limit-error branch from 226e847 to 3351732 Compare February 14, 2026 11:46
@Dang-Ver
Copy link
Author

@Wendong-Fan.
plz help me with this issue

Copy link
Contributor

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dang-Ver for the fix! I think a better solution would be to remove the error-formatting module from the original code. In most cases, the API already provides a clear and informative error message. Adding this extra formatting layer introduces additional maintenance overhead and potential bugs.

It would be great if you’re interested in refactoring the current implementation to simply preserve and return the original error message from the API response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants