Overview
I spent time reading through the core source — prompt_compressor.py (~2,455 lines), utils.py, and the examples — as someone integrating LLMLingua into a RAG pipeline. Overall this is impressively well-documented research code. Here are observations, questions, and a few concrete suggestions.
Strengths Worth Calling Out
- Docstrings are thorough. The parameter-level documentation in
compress_prompt and structured_compress_prompt is genuinely useful — the math definition of compression rate and the caveats about tokenizer mismatch are the kind of context users actually need.
- The
__call__ delegation to compress_prompt is a nice UX touch — lets users treat the compressor as a callable without remembering method names.
rank_method extensibility is well-designed; adding a new ranking backend (BM25, Cohere, sentence-BERT, etc.) is a clean one-function addition inside get_rank_results.
Questions & Observations
1. Mutable default arguments in function signatures
Several public methods use mutable defaults:
def compress_prompt(self, ..., force_tokens: List[str] = [], ...):
def compress_prompt_llmlingua2(self, ..., force_context_ids: List[int] = [], force_tokens: List[str] = [], ...):
Python shares the same list object across all calls when the default is a mutable literal. This is a classic footgun — if any call path appends to force_tokens, subsequent calls with the default will see the accumulated state. The standard fix is force_tokens: List[str] = None with an early if force_tokens is None: force_tokens = [] guard. Given the library's public API surface, this seems worth hardening.
2. trust_remote_code=True on by default
In load_model:
trust_remote_code = model_config.get("trust_remote_code", True)
The default is True, meaning any model loaded from HuggingFace Hub will execute arbitrary remote code without the user opting in. For a library with a broad user base (LangChain, LlamaIndex integrations), this seems like a surprising default. Most downstream frameworks default this to False and require explicit opt-in. Would a change to default False break any of the recommended models, or is it only needed for specific quantized variants?
3. TokenClfDataset hardcodes model name string matching
if "bert-base-multilingual-cased" in model_name:
...
elif "xlm-roberta-large" in model_name:
...
else:
raise NotImplementedError()
The same string-matching pattern appears in is_begin_of_new_word and get_pure_token in utils.py. This means any new model (even a fine-tune like my-org/xlm-roberta-large-finetuned) may silently route to the wrong branch or raise. A model family enum or a config-driven approach would make adding new model families less error-prone and avoid the current reliance on substring matching.
4. saving is hardcoded to GPT-4 pricing
saving = (origin_tokens - compressed_tokens) * 0.06 / 1000
The $0.06/1K tokens figure (GPT-4 input pricing circa 2023) is baked into the return value of both compress_prompt and compress_prompt_llmlingua2. GPT-4o pricing is now ~10x cheaper, and users routing to other models will see a misleading savings estimate. Options: accept an optional cost_per_token parameter, document the assumption clearly in the return value docstring, or drop this field in favor of letting callers compute savings from the raw token counts.
5. process_sequence_data has a typo — dict values use [ brackets
elif value_type == "dict" or value_type == "dictionary":
return (
"<llmlingua, compress=False>"
+ f'"{k}": {process_sequence_data(rate, "[", "]", v, is_dict=True)}'
)
A JSON dict should be wrapped in {/}, not [/]. This may be intentional for the internal markup format, but if so it would benefit from a comment explaining why, since it looks like a bug to a reader unfamiliar with the internal representation.
6. get_rank_results nests all ranking functions as inner closures
The twelve get_distance_* functions are all defined as closures inside get_rank_results and re-created on every call. This isn't a correctness issue, but it makes the function ~200 lines long and prevents unit-testing individual ranking methods in isolation. Extracting them as private methods or a strategy registry would improve testability and readability.
Minor Suggestions
- The
context_budget parameter accepts a string like "+100" but this is not validated — a malformed string would produce a silent arithmetic error downstream. A brief validation or an explicit int | str union type with parsing docs would help.
seed_everything(42) is called unconditionally in init_llmlingua2, which globally sets torch, numpy, and random seeds. This is fine for reproducibility in research but is a side-effect that library users may not expect. Worth noting in the docstring.
Summary
The library is well-written research code with production integration already in LangChain and LlamaIndex. The main areas I'd prioritize for a stability pass are the mutable default argument issue (correctness risk), the trust_remote_code default (security), and the hardcoded pricing constant (staleness). The string-matching model dispatch is a longer-term maintainability concern.
Thanks for the work on this — the LongLLMLingua context-reordering approach in particular is elegant.
— A developer exploring LLMLingua for RAG cost optimization
Overview
I spent time reading through the core source —
prompt_compressor.py(~2,455 lines),utils.py, and the examples — as someone integrating LLMLingua into a RAG pipeline. Overall this is impressively well-documented research code. Here are observations, questions, and a few concrete suggestions.Strengths Worth Calling Out
compress_promptandstructured_compress_promptis genuinely useful — the math definition of compression rate and the caveats about tokenizer mismatch are the kind of context users actually need.__call__delegation tocompress_promptis a nice UX touch — lets users treat the compressor as a callable without remembering method names.rank_methodextensibility is well-designed; adding a new ranking backend (BM25, Cohere, sentence-BERT, etc.) is a clean one-function addition insideget_rank_results.Questions & Observations
1. Mutable default arguments in function signatures
Several public methods use mutable defaults:
Python shares the same list object across all calls when the default is a mutable literal. This is a classic footgun — if any call path appends to
force_tokens, subsequent calls with the default will see the accumulated state. The standard fix isforce_tokens: List[str] = Nonewith an earlyif force_tokens is None: force_tokens = []guard. Given the library's public API surface, this seems worth hardening.2.
trust_remote_code=Trueon by defaultIn
load_model:The default is
True, meaning any model loaded from HuggingFace Hub will execute arbitrary remote code without the user opting in. For a library with a broad user base (LangChain, LlamaIndex integrations), this seems like a surprising default. Most downstream frameworks default this toFalseand require explicit opt-in. Would a change to defaultFalsebreak any of the recommended models, or is it only needed for specific quantized variants?3.
TokenClfDatasethardcodes model name string matchingThe same string-matching pattern appears in
is_begin_of_new_wordandget_pure_tokeninutils.py. This means any new model (even a fine-tune likemy-org/xlm-roberta-large-finetuned) may silently route to the wrong branch or raise. A model family enum or a config-driven approach would make adding new model families less error-prone and avoid the current reliance on substring matching.4.
savingis hardcoded to GPT-4 pricingThe
$0.06/1K tokensfigure (GPT-4 input pricing circa 2023) is baked into the return value of bothcompress_promptandcompress_prompt_llmlingua2. GPT-4o pricing is now ~10x cheaper, and users routing to other models will see a misleading savings estimate. Options: accept an optionalcost_per_tokenparameter, document the assumption clearly in the return value docstring, or drop this field in favor of letting callers compute savings from the raw token counts.5.
process_sequence_datahas a typo — dict values use[bracketsA JSON dict should be wrapped in
{/}, not[/]. This may be intentional for the internal markup format, but if so it would benefit from a comment explaining why, since it looks like a bug to a reader unfamiliar with the internal representation.6.
get_rank_resultsnests all ranking functions as inner closuresThe twelve
get_distance_*functions are all defined as closures insideget_rank_resultsand re-created on every call. This isn't a correctness issue, but it makes the function ~200 lines long and prevents unit-testing individual ranking methods in isolation. Extracting them as private methods or a strategy registry would improve testability and readability.Minor Suggestions
context_budgetparameter accepts a string like"+100"but this is not validated — a malformed string would produce a silent arithmetic error downstream. A brief validation or an explicitint | strunion type with parsing docs would help.seed_everything(42)is called unconditionally ininit_llmlingua2, which globally setstorch,numpy, andrandomseeds. This is fine for reproducibility in research but is a side-effect that library users may not expect. Worth noting in the docstring.Summary
The library is well-written research code with production integration already in LangChain and LlamaIndex. The main areas I'd prioritize for a stability pass are the mutable default argument issue (correctness risk), the
trust_remote_codedefault (security), and the hardcoded pricing constant (staleness). The string-matching model dispatch is a longer-term maintainability concern.Thanks for the work on this — the LongLLMLingua context-reordering approach in particular is elegant.
— A developer exploring LLMLingua for RAG cost optimization