Skip to content

feat: add LiteLLM as AI gateway provider#514

Open
RheagalFire wants to merge 3 commits intoAsyncFuncAI:mainfrom
RheagalFire:feat/add-litellm-provider
Open

feat: add LiteLLM as AI gateway provider#514
RheagalFire wants to merge 3 commits intoAsyncFuncAI:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire
Copy link
Copy Markdown

@RheagalFire RheagalFire commented Apr 29, 2026

Summary

  • Adds LiteLLM as a new AI gateway provider, routing to 100+ LLM providers via a single LiteLLMClient
  • Follows the existing ModelClient pattern used by OpenAI, Bedrock, OpenRouter, etc.
  • Closes Integration with LiteLLM #471

Prior art

Motivation

DeepWiki currently ships separate client files for each provider (OpenAI, Bedrock, Azure, OpenRouter, DashScope, Ollama). Users who want to use providers not covered by these clients (Anthropic direct, Groq,
Together, Fireworks, Mistral, etc.) have no path today. Issue #471 describes exactly this: a user trying to connect via LiteLLM proxy to vLLM couldn't get it working.

LiteLLM routes to 100+ providers through a single unified interface. Adding it as a native provider means users can access any supported provider by setting the appropriate env var and model name.

Changes

  • api/litellm_client.py -- new LiteLLMClient(ModelClient) with sync call(), async acall(), embedding support, and streaming support
  • api/config.py -- registered LiteLLMClient in CLIENT_CLASSES and default_map
  • api/config/generator.json -- added litellm provider entry with supportsCustomModel: true
  • api/pyproject.toml -- added litellm>=1.60.0,<2.0 as optional dependency
  • tests/unit/test_litellm_client.py -- 18 unit tests

Usage and testing

1. Unit tests (17 passed, 1 skipped for missing boto3):

tests/unit/test_litellm_client.py::TestLiteLLMClientInit::test_default_init PASSED
tests/unit/test_litellm_client.py::TestLiteLLMClientInit::test_init_with_params PASSED                                                                                                                             
tests/unit/test_litellm_client.py::TestConvertInputs::test_llm_string_input PASSED    
tests/unit/test_litellm_client.py::TestConvertInputs::test_llm_message_list_input PASSED                                                                                                                           
tests/unit/test_litellm_client.py::TestConvertInputs::test_embedder_string_input PASSED 
tests/unit/test_litellm_client.py::TestConvertInputs::test_embedder_list_input PASSED                                                                                                                              
tests/unit/test_litellm_client.py::TestConvertInputs::test_unsupported_model_type PASSED
tests/unit/test_litellm_client.py::TestCallMocked::test_completion_dispatches_correctly PASSED                                                                                                                     
tests/unit/test_litellm_client.py::TestCallMocked::test_embedding_dispatches_correctly PASSED 
tests/unit/test_litellm_client.py::TestCallMocked::test_api_key_forwarded_when_set PASSED                                                                                                                          
tests/unit/test_litellm_client.py::TestCallMocked::test_api_key_omitted_when_blank PASSED                                                                                                                          
tests/unit/test_litellm_client.py::TestCallMocked::test_base_url_forwarded_when_set PASSED
tests/unit/test_litellm_client.py::TestParseCompletion::test_parse_chat_completion PASSED                                                                                                                          
tests/unit/test_litellm_client.py::TestParseCompletion::test_track_usage PASSED                                                                                                                                    
tests/unit/test_litellm_client.py::TestSerialization::test_from_dict PASSED                                                                                                                                        
tests/unit/test_litellm_client.py::TestSerialization::test_to_dict_excludes_clients PASSED                                                                                                                         
tests/unit/test_litellm_client.py::TestConfigRegistration::test_litellm_provider_in_generator_config PASSED                                                                                                        
======================== 17 passed, 1 skipped in 0.45s =========================                           

2. Live E2E against real provider (Anthropic via Azure AI Foundry):

from api.litellm_client import LiteLLMClient                                                                                                                                                                       
import adalflow as adal                                                                                                                                                                                            
                                                                                                                                                                                                                   
client = LiteLLMClient(
    api_key="<key>",                                                                                                                                                                                               
    base_url="https://amanrai-test-resource.services.ai.azure.com/anthropic",
)                                                                                                                                                                                                                  
gen = adal.Generator(
    model_client=client,                                                                                                                                                                                           
    model_kwargs={"model": "anthropic/claude-sonnet-4-6", "max_tokens": 50},                                                                                                                                       
)                                                                                                                                                                                                                  
response = gen(prompt_kwargs={"input_str": "What is 2+2? Answer with just the number."})                                                                                                                           
print(response.data)  # "4"                                                                                                                                                                                        
print(response.usage)  # CompletionUsage(completion_tokens=5, prompt_tokens=73, total_tokens=78)
  1. Async completion verified:
    Async content: Hello!

  2. Message list input verified:
    Content with messages: Hi there!

Example usage

Via generator.json config (recommended):
Set default_provider to litellm in api/config/generator.json and configure any LiteLLM-supported model:

{                                                                                                                                                                                                                  
  "default_provider": "litellm",
  "providers": {                                                                                                                                                                                                   
    "litellm": {
      "client_class": "LiteLLMClient",
      "default_model": "anthropic/claude-sonnet-4-20250514",
      "supportsCustomModel": true,                                                                                                                                                                                 
      "models": {                 
        "anthropic/claude-sonnet-4-20250514": {"temperature": 0.7}                                                                                                                                                 
      }                                                                                                                                                                                                            
    }  
  }                                                                                                                                                                                                                
}               

Via Python directly:

from api.litellm_client import LiteLLMClient                                                                                                                                                                       
import adalflow as adal                     
                                                                                                                                                                                                                   
# Uses ANTHROPIC_API_KEY from environment automatically
gen = adal.Generator(                                                                                                                                                                                              
    model_client=LiteLLMClient(),
    model_kwargs={"model": "anthropic/claude-sonnet-4-20250514"},                                                                                                                                                  
)                                                                
response = gen(prompt_kwargs={"input_str": "Explain this code"})                                                                                                                                                   

Risk / Compatibility

  • Additive only. Existing providers untouched.
  • litellm is an optional dependency -- base install unaffected.
  • drop_params=True ensures cross-provider compatibility by silently dropping unsupported kwargs.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the LiteLLMClient to integrate with numerous LLM providers, along with corresponding configuration updates and unit tests. The review feedback identifies several high-priority improvements for the new client, including addressing thread-safety concerns and incorrect field mappings during response parsing, particularly for streaming. Other recommendations include narrowing exception handling in retry logic to avoid unnecessary retries on logic errors, sanitizing logs to prevent the exposure of sensitive API data, and optimizing the configuration loading process by moving static dictionary definitions out of loops.

Comment thread api/litellm_client.py Outdated
Comment thread api/litellm_client.py Outdated
Comment thread api/litellm_client.py Outdated
Comment thread api/litellm_client.py Outdated
Comment thread api/config.py Outdated
@RheagalFire RheagalFire force-pushed the feat/add-litellm-provider branch from 7115d1f to 0e3789d Compare April 29, 2026 19:39
@sashimikun
Copy link
Copy Markdown

rlm-wiki review

PR Review: feat: add LiteLLM as AI gateway provider (#514)

Summary

This PR adds LiteLLMClient as a new AI gateway provider, routing to 100+ LLM providers through a single unified interface. The implementation follows the existing ModelClient pattern and is largely correct in structure and design — lazy litellm imports, graceful giveup logic for retry, and the streaming parsing approach are all solid. However, there are several blocking issues that must be fixed before merge: a hidden backoff dependency, a TypeError footgun in call()/acall(), a logical streaming+retry semantic mismatch, and dead init methods that mislead readers about how the client actually works. Several other minor gaps round out the review.


Issues

Critical

1. backoff is not declared as a direct dependency — only transitive

File: api/pyproject.toml / api/litellm_client.py:27
Evidence: import backoff is at module level and executes unconditionally at import time. pyproject.toml does not list backoff as a direct dependency (verified by grep -n 'backoff' api/pyproject.toml returning nothing). It appears in poetry.lock only as a transitive dependency of adalflow. If adalflow ever drops it as a transitive dependency, all deployments silently break. It must be declared explicitly:

# api/pyproject.toml — add to [tool.poetry.dependencies]
backoff = ">=2.2.1,<3.0.0"

2. TypeError if caller passes drop_params in model_kwargs

File: api/litellm_client.py:210-211, api/litellm_client.py:230-231
Evidence:

return litellm.completion(drop_params=True, **api_kwargs, **extra)

If api_kwargs (which comes from convert_inputs_to_api_kwargs, which shallow-copies model_kwargs) already contains the key drop_params, Python raises TypeError: keyword argument repeated. Similarly, if api_kwargs already contains api_key or api_base keys AND those same keys are in extra, Python raises TypeError: got multiple values for keyword argument. Both are realistic user mistakes. Fix by merging dicts rather than using ** unpacking:

final_kwargs = {"drop_params": True, **api_kwargs, **extra}
return litellm.completion(**final_kwargs)

This correctly lets extra win over api_kwargs, avoids duplicate keyword errors, and is explicit.


Major

3. backoff retry decorator has no effect on streaming failures

File: api/litellm_client.py:203, api/litellm_client.py:223
Evidence: When stream=True is passed in api_kwargs, litellm.acompletion() returns an async generator immediately (no exception at the awaited call site). The @backoff.on_exception decorator only sees the coroutine resolving successfully — it will never fire on mid-stream connection drops or rate-limit interruptions. The max_time=5 bound also means that even for non-streaming calls, an auto-retry window of 5 seconds total is extremely tight for any real rate-limit back-off. Consider documenting this limitation explicitly and/or separating streaming vs. non-streaming paths.

4. init_sync_client / init_async_client are dead code that mislead readers

File: api/litellm_client.py:113-117
Evidence: Both methods return plain dicts ({"api_key": ..., "base_url": ...}). These dicts are stored as self.sync_client and self.async_client but are never read inside call() or acall() — those methods rebuild extra from scratch using self._api_key / self._base_url. This creates a false impression that the client is managing a connection pool. Either remove these methods (calling super().__init__() is sufficient) or make them return the dict that call() actually uses so there is a single source of truth.

5. Module-level import of LiteLLMClient in core serving files before optional dep guard

File: api/config.py:16, api/simple_chat.py:21, api/websocket_wiki.py:26
Evidence:

from api.litellm_client import LiteLLMClient   # unconditional in all 3 files

While litellm itself is lazily imported inside call()/acall(), import backoff runs unconditionally at module load time in litellm_client.py. If backoff is ever unavailable (see finding #1), the entire server fails to start even for users who will never use LiteLLM. The standard pattern for optional provider imports is a guarded try/except at the top of litellm_client.py:

try:
    import backoff
    _BACKOFF_AVAILABLE = True
except ImportError:
    _BACKOFF_AVAILABLE = False

Or use a lazy wrapper so the dependency is only required when the client is actually instantiated.

6. from_dict silently drops chat_completion_parser

File: api/litellm_client.py:237-238
Evidence:

@classmethod
def from_dict(cls, data: Dict[str, Any]):
    return cls(**data)

If data was produced by to_dict() (which calls super().to_dict()), and the original client had a custom chat_completion_parser, that callable cannot be serialized to JSON and therefore cannot be restored. This is fine if to_dict() never emits it — but the round-trip contract is silently broken. Document this limitation clearly.

7. Streaming calls in simple_chat.py and websocket_wiki.py bypass parse_chat_completion entirely

File: api/simple_chat.py:572-580, api/websocket_wiki.py:723-732
Evidence: The integration code manually unpacks chunk deltas inline rather than routing through LiteLLMClient.parse_chat_completion. This means the handle_streaming_response generator logic in the client class is never exercised by the actual serving path — only call()/acall() and raw chunk iteration are used. The streaming parser path in litellm_client.py is tested in isolation but never called in production. This is duplicated fragile logic; it should use handle_streaming_response from the client.


Minor

8. _is_retryable uses string module+name matching — fragile

File: api/litellm_client.py:40-47
Evidence:

qualname = f"{type(exc).__module__}.{type(exc).__name__}"

This approach will fail silently if LiteLLM ever renames its exception module or subclasses exceptions. The more idiomatic and robust pattern is:

try:
    import litellm
    _RETRYABLE = (
        litellm.exceptions.RateLimitError,
        litellm.exceptions.ServiceUnavailableError,
        litellm.exceptions.Timeout,
        litellm.exceptions.APIConnectionError,
        litellm.exceptions.InternalServerError,
    )
    def _is_retryable(exc): return isinstance(exc, _RETRYABLE)
except ImportError:
    def _is_retryable(exc): return False

9. LiteLLMClient() instantiation does not forward api_key/base_url from config

File: api/simple_chat.py:453, api/websocket_wiki.py:564
Evidence:

model = LiteLLMClient()   # always uses defaults

Both serving paths always construct LiteLLMClient() with no arguments, meaning the api_key and base_url fields in generator.json are ignored even if a user configures them there. This is inconsistent with how other providers (e.g. AzureAIClient) forward config into the constructor.

10. supportsCustomModel: true with default_model: "openai/gpt-4o" in generator.json

File: api/config/generator.json
Evidence: The default model in the LiteLLM provider config routes to OpenAI. A user who installs LiteLLM to access a non-OpenAI provider will still hit OpenAI's endpoint unless they explicitly override the model name. A more neutral default like "openai/gpt-4o-mini" or no default model (requiring explicit configuration) would be safer.


Nitpick

11. Unused imports in litellm_client.py

File: api/litellm_client.py:22, 26

from typing import (
    ...
    TypeVar,   # T is defined but never used
    Union,     # never referenced
)
T = TypeVar("T")   # dead

Remove TypeVar, T = TypeVar("T"), and Union.

12. init_async_client is called nowhere

File: api/litellm_client.py:116
init_async_client is defined but never called — self.async_client is set to None in __init__ and never updated. This further confirms these methods are dead code (see finding #4).

13. Log message at DEBUG level but docstring says INFO security note

File: api/litellm_client.py:208

log.debug(f"api_kwargs: {api_kwargs}")

The log is correctly at DEBUG level (this was flagged in the previous review and has been addressed). ✓


Suggestions

Combine init methods and make call/acall use them:

def init_sync_client(self):
    # Remove or make return actual config used by call()
    pass

def call(self, api_kwargs=None, model_type=ModelType.UNDEFINED):
    import litellm
    api_kwargs = api_kwargs or {}
    extra = {}
    if self._api_key:
        extra["api_key"] = self._api_key
    if self._base_url:
        extra["api_base"] = self._base_url
    merged = {"drop_params": True, **api_kwargs, **extra}
    if model_type == ModelType.EMBEDDER:
        return litellm.embedding(**merged)
    elif model_type == ModelType.LLM:
        return litellm.completion(**merged)
    raise ValueError(f"model_type {model_type} is not supported")

Guard the backoff import:

try:
    import backoff as _backoff
    def _with_retry(fn):
        return _backoff.on_exception(
            _backoff.expo, Exception, max_time=60,
            giveup=lambda e: not _is_retryable(e)
        )(fn)
except ImportError:
    def _with_retry(fn):
        return fn

Consider max_time=60 (or configurable) — the current max_time=5 with exponential backoff provides no meaningful retry window for rate limits.


Positive Highlights

  • Lazy import litellm inside call()/acall() is the correct pattern for optional heavy dependencies — this means importing litellm_client.py itself doesn't load all of litellm's startup machinery.
  • drop_params=True is thoughtful cross-provider compatibility; silently dropping unsupported kwargs prevents hard failures when switching providers.
  • _is_retryable giveup predicate correctly avoids retrying non-transient errors (auth failures, bad model names). The design intent is sound even if the implementation is string-based.
  • convert_inputs_to_api_kwargs handles string, tagged-message, and pre-built message-list inputs cleanly, and the tagged-message fallback path is well-tested.
  • 18 unit tests with strong coverage of init, input conversion, call dispatch, streaming, serialization, and retry predicates — a good foundation even though some paths (streaming end-to-end, fallback path in simple_chat) aren't covered yet.
  • Config refactor in config.py (default_map moved outside the loop) is a real improvement over the original code structure.

Residual Test Risk

The tests do not cover:

  • The actual serving integration paths in simple_chat.py and websocket_wiki.py (inline chunk-unpacking logic is untested).
  • The fallback path (simplified prompt retry) for LiteLLM.
  • from_dict / to_dict round-trip fidelity when a base_url is set.
  • The TypeError from duplicate keyword argument (finding Just using docker-compose doesn't work - macbook arm64 #2).
  • Behavior when litellm is not installed (import-time error surface).

Posted from rlm-wiki.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration with LiteLLM

2 participants