feat: llm compressor AI-23582 #79

EdisonSu768 · 2026-01-13T06:54:22Z

Summary by CodeRabbit

Documentation
- Updated navigation label for model optimization section.
- Added comprehensive LLM Compressor documentation including introduction to compression techniques (weight quantization and pruning).
- Added integration guide for LLM Compressor with Alauda AI platform.
- Added manual evaluation guidelines.
- Added example notebooks demonstrating data-free and calibration-based compression workflows.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-13T06:54:28Z

Walkthrough

Navigation label updated from "PreTraining" to "Training" in tools documentation. Comprehensive LLM Compressor documentation added, including introduction, index, how-to guides covering Alauda AI integration and model evaluation workflows. Two example Jupyter notebooks demonstrate data-free and calibration-based compression techniques using llm-compressor.

Changes

Cohort / File(s)	Summary
Navigation Update `docs/en/installation/tools.mdx`	Updated nav_pre_train label text from "PreTraining" to "Training" (bilingual: 预训练 → 训练)
LLM Compressor Documentation Root `docs/en/llm-compressor/intro.mdx`, `docs/en/llm-compressor/index.mdx`	Added introduction describing LLM Compressor, its integration with Hugging Face and vLLM, and compression techniques (W4A16, W8A8, pruning); added root index page with Overview component
LLM Compressor How-To Guides `docs/en/llm-compressor/how_to/index.mdx`, `docs/en/llm-compressor/how_to/compressor_by_workbench.mdx`, `docs/en/llm-compressor/how_to/evaluate_model.mdx`	Added how-to index page; documented Alauda AI platform integration with workbench-based workflows; documented manual evaluation steps including prerequisites, custom task creation, lm_eval patching, and evaluation configuration
Example Jupyter Notebooks `docs/public/data-free-compressor.ipynb`, `docs/public/calibration-compressor.ipynb`	Added data-free compression example with W4A16 quantization recipe; added GPTQ calibration-based compression example with calibration dataset preparation and evaluation; both notebooks include lm_eval integration for performance assessment

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

zhaomingkun1030
typhoonzero

Poem

📚 Hop along through docs so bright,
Compression guides to make things light,
From intro page to notebooks two,
LLM Compressor waits for you!
🐰 —CodeRabbit

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'feat: llm compressor AI-23582' is vague and does not clearly communicate the main change; it lacks specificity about what was added or modified in the documentation.	Consider a more descriptive title such as 'docs: Add LLM Compressor documentation and example notebooks' to clearly convey that documentation files and examples are being added.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-01-13T07:01:14Z

Deploying alauda-ai with Cloudflare Pages

Latest commit:	`a26a899`
Status:	✅ Deploy successful!
Preview URL:	https://dc144ea6.alauda-ai.pages.dev
Branch Preview URL:	https://feat-llm-compressor.alauda-ai.pages.dev

View logs

coderabbitai

Actionable comments posted: 8

🤖 Fix all issues with AI agents

In `@docs/en/llm-compressor/how_to/compressor_by_workbench.mdx`:
- Line 50: In docs/en/llm-compressor/how_to/compressor_by_workbench.mdx update
the link target that currently points to
../../model_inference/inference_service/functions/inference_service.html#create-inference-service
so it uses the correct .mdx extension
(../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service);
this fixes the broken link when creating a new inference service after uploading
the compressed model.

In `@docs/en/llm-compressor/how_to/evaluate_model.mdx`:
- Around line 31-46: The docs currently hardcode a Python version in the
site-packages path; update the instructions to avoid a specific Python version
by telling users to edit the lm_eval/tasks/__init__.py file inside their active
virtualenv's site-packages directory (do not hardcode
~/.venv/lib/python3.11/...), and instruct them to locate the block that computes
relative_yaml_path (the try/except around
yaml_path.relative_to(lm_eval_tasks_path)) and apply the suggested
ValueError-handling change there; also add a short note directing users to use
their Python/virtualenv tools to discover their site-packages location rather
than assuming a path.
- Around line 54-73: The YAML example uses an incorrect function reference
syntax: replace the hyphenated `!function preprocess_wikitext-process_results`
with the proper import-style dotted path `!function
preprocess_wikitext.process_results` (matching the other `!function
preprocess_wikitext.wikitext_detokenizer` usage) so the `!function` directive
points to the module attribute correctly.

In `@docs/public/calibration-compressor.ipynb`:
- Around line 122-132: The notebook calls tokenizer.save_pretrained(model_dir)
but tokenizer is undefined; import or instantiate the tokenizer before this save
step (e.g., load/create the tokenizer tied to model_id) so that tokenizer exists
when saving; ensure the tokenizer variable matches the model used (referencing
tokenizer, model, model_id, and model_dir) and place the tokenizer
initialization before the save_pretrained call.
- Around line 148-170: The notebook has syntax errors from stray spaces in
identifiers: fix the tokenization so uses os.environ (not "os. environ"),
lm_eval.tasks (not "lm_eval. tasks"), and TaskManager( (not "TaskManager (")
when constructing task_manager; update those occurrences in the cell to remove
the extraneous spaces so the imports and the TaskManager(...) call parse
correctly.
- Around line 102-120: The notebook cell is missing imports for
AutoModelForCausalLM and oneshot; add import statements for these symbols (e.g.,
from transformers import AutoModelForCausalLM and from the library that provides
oneshot) in an earlier cell or at the top of this cell so
AutoModelForCausalLM.from_pretrained(...) and oneshot(...) resolve correctly;
ensure the import for oneshot matches the package used elsewhere in the project
(the same module that defines oneshot).
- Around line 36-100: The notebook fails because tokenizer is used inside
preprocess but never defined; add importing and loading of the model tokenizer
(e.g., from transformers import AutoTokenizer) and instantiate tokenizer before
the dataset cell (matching the model used for compression, e.g.,
AutoTokenizer.from_pretrained(model_id, use_fast=... or appropriate kwargs));
ensure the tokenizer exposes apply_chat_template (or wrap/assign a function if
your tokenizer wrapper provides that) so preprocess,
tokenizer.apply_chat_template and tokenizer(...) calls succeed.

In `@docs/public/data-free-compressor.ipynb`:
- Around line 109-131: There are syntax errors from stray spaces in identifiers
and a potential missed usage of the TaskManager: remove the spaces so use
os.environ (not os. environ), lm_eval.tasks (not lm_eval. tasks), and call
TaskManager(...) (not TaskManager ( ... )), then either pass the created
task_manager into lm_eval.simple_evaluate (or add a clear comment that
instantiating task_manager is relied upon for side-effect registration) to make
intent explicit; update any related variable names (task_manager, my-wikitext)
accordingly.

🧹 Nitpick comments (2)

docs/en/installation/tools.mdx (1)

138-140: Consider updating the i18nKey for consistency.

The i18nKey remains "nav_pre_train" while the display text has been updated to "Training". While i18n keys don't technically need to match the display text, maintaining consistency between the key name and its meaning improves maintainability.

Since this is example documentation showing what the merged ConfigMap looks like, verify whether the actual system configuration also needs this i18nKey updated to something like "nav_training".
docs/en/llm-compressor/intro.mdx (1)
14-14: Minor: Use hyphenated "floating-point" as compound adjective.

When used as a compound adjective before a noun, "floating-point" should be hyphenated.
📝 Suggested fix
-- Weight and activation quantization (W8A8) compresses both weights and activations to 8-bit precision, targeting general server scenarios for integer and floating point formats.
+- Weight and activation quantization (W8A8) compresses both weights and activations to 8-bit precision, targeting general server scenarios for integer and floating-point formats.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6090ae2 and a26a899.

📒 Files selected for processing (8)

docs/en/installation/tools.mdx
docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
docs/en/llm-compressor/how_to/evaluate_model.mdx
docs/en/llm-compressor/how_to/index.mdx
docs/en/llm-compressor/index.mdx
docs/en/llm-compressor/intro.mdx
docs/public/calibration-compressor.ipynb
docs/public/data-free-compressor.ipynb

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-12-31T02:30:16.360Z

Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.

Applied to files:

docs/en/llm-compressor/how_to/index.mdx
docs/en/installation/tools.mdx
docs/en/llm-compressor/how_to/evaluate_model.mdx
docs/en/llm-compressor/index.mdx
docs/en/llm-compressor/intro.mdx
docs/en/llm-compressor/how_to/compressor_by_workbench.mdx

🪛 LanguageTool

docs/en/llm-compressor/how_to/evaluate_model.mdx

[style] ~50-~50: Consider using “inaccessible” to avoid wordiness.
Context: ...m Hugging Face. Because Hugging Face is not accessible from mainland China, you must define a ...

(NOT_ABLE_PREMIUM)

docs/en/llm-compressor/intro.mdx

[grammar] ~14-~14: Use a hyphen to join words.
Context: ...erver scenarios for integer and floating point formats. - Weight pruning, also kn...

(QB_NEW_EN_HYPHEN)

[style] ~15-~15: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...is requires fine-tuning, it can be used in conjunction with quantization for further inference acce...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)

🪛 Ruff (0.14.11)

docs/public/calibration-compressor.ipynb

34-34: Undefined name tokenizer

(F821)

38-38: Undefined name tokenizer

(F821)

51-51: Undefined name AutoModelForCausalLM

(F821)

55-55: Undefined name oneshot

(F821)

67-67: Undefined name tokenizer

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pipelines as Code CI / doc-pr-build-ai

🔇 Additional comments (10)

docs/en/llm-compressor/index.mdx (1)

1-7: LGTM!

Standard index page structure with appropriate frontmatter weight and the <Overview /> component placeholder.

docs/en/llm-compressor/how_to/index.mdx (1)

1-7: LGTM!

Standard how-to section index page following the established documentation pattern.

docs/en/llm-compressor/intro.mdx (1)

7-15: LGTM! Clear introduction to LLM Compressor.

The content provides a good overview of the framework, its integration with Hugging Face and vLLM, and the supported compression techniques. The external links to the GitHub repository and vLLM documentation are helpful references.

docs/en/llm-compressor/how_to/compressor_by_workbench.mdx (2)

1-50: LGTM! Comprehensive workflow documentation.

The guide provides clear, well-structured instructions for using LLM Compressor with the Alauda AI platform. The workflow steps are logically organized, and the conditional sections (optional dataset preparation for data-free vs calibration workflows) are appropriately marked.

9-10: The notebook paths are correctly configured. Both referenced files exist at docs/public/calibration-compressor.ipynb and docs/public/data-free-compressor.ipynb. The absolute paths (/data-free-compressor.ipynb, /calibration-compressor.ipynb) are the correct format for referencing files in the public directory and will resolve properly when the documentation is built.

docs/public/data-free-compressor.ipynb (3)

1-20: LGTM!

Clear introduction with appropriate notes about GPU requirements and lm_eval version dependency.

28-54: LGTM!

GPU detection and quantization recipe setup are well-implemented. The W4A16 scheme with lm_head exclusion is a sensible default configuration.

56-93: LGTM!

Model loading, compression, and saving workflow is correct. Using device_map="auto" and torch_dtype="auto" provides good flexibility.

docs/public/calibration-compressor.ipynb (1)

1-35: LGTM!

Clear introduction explaining the calibration-based workflow and GPTQ algorithm usage.

docs/en/llm-compressor/how_to/evaluate_model.mdx (1)

79-79: The notebook links are correct. The notebooks exist at docs/public/calibration-compressor.ipynb and docs/public/data-free-compressor.ipynb, and the absolute paths /data-free-compressor.ipynb and /calibration-compressor.ipynb follow the standard documentation site pattern where the public assets directory is served from the root. These links are used consistently across multiple documentation files.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

coderabbitai · 2026-01-14T09:26:33Z

docs/en/llm-compressor/how_to/compressor_by_workbench.mdx

+
+### Deploy and Use the Compressed Model for Inference
+
+After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.html#create-inference-service) to complete this step.


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all inference_service files in the repository find . -type f $ -name "*inference_service*" $ | head -20

Repository: alauda/aml-docs

Length of output: 223

🏁 Script executed:

#!/bin/bash # List the documentation structure around model_inference git ls-files | grep -E "model_inference.*inference_service" | head -20

Repository: alauda/aml-docs

Length of output: 909

🏁 Script executed:

#!/bin/bash # Check the actual path mentioned in the link ls -la ./docs/en/model_inference/inference_service/functions/ 2>/dev/null || echo "Path with functions subdirectory not found"

Repository: alauda/aml-docs

Length of output: 304

🏁 Script executed:

#!/bin/bash # Check alternative paths ls -la ./docs/en/model_inference/inference_service/ 2>/dev/null || echo "Path without functions subdirectory not found"

Repository: alauda/aml-docs

Length of output: 461

Link extension is incorrect: should be .mdx not .html.

This link uses .html but the actual file is docs/en/model_inference/inference_service/functions/inference_service.mdx. Change the extension to .mdx to avoid broken links.

📝 Suggested fix

-After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.html#create-inference-service) to complete this step. +After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service) to complete this step.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.html#create-inference-service) to complete this step.

After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service) to complete this step.

🤖 Prompt for AI Agents

In `@docs/en/llm-compressor/how_to/compressor_by_workbench.mdx` at line 50, In docs/en/llm-compressor/how_to/compressor_by_workbench.mdx update the link target that currently points to ../../model_inference/inference_service/functions/inference_service.html#create-inference-service so it uses the correct .mdx extension (../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service); this fixes the broken link when creating a new inference service after uploading the compressed model.

coderabbitai · 2026-01-14T09:26:33Z

docs/en/llm-compressor/how_to/evaluate_model.mdx

+:::note
+As of the latest release, the `lm_eval` library does not natively support custom evaluation tasks. To enable this capability, you must manually apply a small patch to the `lm_eval` source code.
+
+Edit the following file:
+
+`~/.venv/lib/python3.11/site-packages/lm_eval/tasks/__init__.py`
+
+Locate approximately line 683 and update the code as shown below. For additional context, see this upstream pull request: [PR #3436](https://github.com/EleutherAI/lm-evaluation-harness/pull/3436/files).
+
+```python
+try:
+    relative_yaml_path = yaml_path.relative_to(lm_eval_tasks_path)
+except ValueError:
+    relative_yaml_path = yaml_path
+```
+:::


⚠️ Potential issue | 🟡 Minor

Hardcoded Python version may cause confusion.

The path ~/.venv/lib/python3.11/site-packages/... assumes Python 3.11. Users with different Python versions will have a different path. Consider making this more generic.

📝 Suggested improvement

Edit the following file: -`~/.venv/lib/python3.11/site-packages/lm_eval/tasks/__init__.py` +`~/.venv/lib/python<VERSION>/site-packages/lm_eval/tasks/__init__.py` + +Replace `<VERSION>` with your installed Python version (e.g., `3.10`, `3.11`).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

:::note

As of the latest release, the `lm_eval` library does not natively support custom evaluation tasks. To enable this capability, you must manually apply a small patch to the `lm_eval` source code.

Edit the following file:

`~/.venv/lib/python3.11/site-packages/lm_eval/tasks/__init__.py`

Locate approximately line 683 and update the code as shown below. For additional context, see this upstream pull request: [PR #3436](https://github.com/EleutherAI/lm-evaluation-harness/pull/3436/files).

```python

try:

relative_yaml_path = yaml_path.relative_to(lm_eval_tasks_path)

except ValueError:

relative_yaml_path = yaml_path

```

:::

:::note

As of the latest release, the `lm_eval` library does not natively support custom evaluation tasks. To enable this capability, you must manually apply a small patch to the `lm_eval` source code.

Edit the following file:

`~/.venv/lib/python<VERSION>/site-packages/lm_eval/tasks/__init__.py`

Replace `<VERSION>` with your installed Python version (e.g., `3.10`, `3.11`).

Locate approximately line 683 and update the code as shown below. For additional context, see this upstream pull request: [PR `#3436`](https://github.com/EleutherAI/lm-evaluation-harness/pull/3436/files).

🤖 Prompt for AI Agents

In `@docs/en/llm-compressor/how_to/evaluate_model.mdx` around lines 31 - 46, The docs currently hardcode a Python version in the site-packages path; update the instructions to avoid a specific Python version by telling users to edit the lm_eval/tasks/__init__.py file inside their active virtualenv's site-packages directory (do not hardcode ~/.venv/lib/python3.11/...), and instruct them to locate the block that computes relative_yaml_path (the try/except around yaml_path.relative_to(lm_eval_tasks_path)) and apply the suggested ValueError-handling change there; also add a short note directing users to use their Python/virtualenv tools to discover their site-packages location rather than assuming a path.

coderabbitai · 2026-01-14T09:26:33Z

docs/en/llm-compressor/how_to/evaluate_model.mdx

+```yaml
+task: my-wikitext
+dataset_path: /home/jovyan/wikitext_document_level #[!code callout]
+dataset_name: wikitext-2-raw-v1
+output_type: loglikelihood_rolling
+training_split: train
+validation_split: validation
+test_split: test
+doc_to_text: ''''
+doc_to_target: !function preprocess_wikitext.wikitext_detokenizer
+process_results: !function preprocess_wikitext-process_results
+should_decontaminate: true
+doc_to_decontamination_query: "{{page}}"
+metric_list:
+  - metric: word_perplexity
+  - metric: byte_perplexity
+  - metric: bits_per_byte
+metadata:
+  version: 1.0
+```


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the specific file mentioned find . -name "evaluate_model.mdx" -type f

Repository: alauda/aml-docs

Length of output: 109

🏁 Script executed:

# Read the specific file to verify the actual content at lines 54-73 cat -n ./docs/en/llm-compressor/how_to/evaluate_model.mdx | sed -n '50,75p'

Repository: alauda/aml-docs

Length of output: 1216

🏁 Script executed:

# Search for the preprocess_wikitext module or file find . -name "*preprocess_wikitext*" -o -name "*wikitext*" | grep -E "\.(py|yaml|yml|ts|js)$" | head -20

Repository: alauda/aml-docs

Length of output: 41

🏁 Script executed:

# Search for any references to preprocess_wikitext or related content rg "preprocess_wikitext" --type md --type mdx --type yaml

Repository: alauda/aml-docs

Length of output: 86

🏁 Script executed:

# Search for preprocess_wikitext references without file type filter rg "preprocess_wikitext"

Repository: alauda/aml-docs

Length of output: 285

🏁 Script executed:

# Check broader context around the YAML snippet cat -n ./docs/en/llm-compressor/how_to/evaluate_model.mdx | sed -n '40,90p'

Repository: alauda/aml-docs

Length of output: 2107

🌐 Web query:

lm_eval YAML !function directive syntax module path naming

💡 Result:

Short answer: use the YAML tag !function followed by a Python import-style path to the class or callable (module and attribute separated by a dot), e.g.

class: !function task.SQuAD2

Details / rules:

The string is treated as a Python import path: . and lm-eval will import the module and fetch that attribute (a Task class or callable). [1][2]

The module must be importable from the harness run context — either a module inside lm_eval (e.g. task.*) or in a directory you add with --include_path / by placing YAML/python files under lm_eval/tasks. [2][3]

You can pass additional config/arguments via fields like recipe (parsed/forwarded to the class constructor / ConfigurableTask). [2]

Examples in the docs show group/task entries using class: !function task.YourClass and using recipe to pass custom args. [2][1]

Sources: lm-eval new task guide / docs and examples. [1][2][3]

References
[1] New Task Guide — lm-evaluation-harness (YAML !function examples).
[2] lm-eval docs / examples showing class: !function task.SQuAD2 and group/task usage.
[3] lm-eval task configuration / include_path notes.

Fix incorrect function reference syntax in YAML example.

Line 64 uses preprocess_wikitext-process_results (with hyphen), which is incorrect. The !function directive in lm_eval requires Python import-style paths with dots separating module and attributes. This should be:

Diff

-process_results: !function preprocess_wikitext-process_results +process_results: !function preprocess_wikitext.process_results

🤖 Prompt for AI Agents

In `@docs/en/llm-compressor/how_to/evaluate_model.mdx` around lines 54 - 73, The YAML example uses an incorrect function reference syntax: replace the hyphenated `!function preprocess_wikitext-process_results` with the proper import-style dotted path `!function preprocess_wikitext.process_results` (matching the other `!function preprocess_wikitext.wikitext_detokenizer` usage) so the `!function` directive points to the module attribute correctly.

coderabbitai · 2026-01-14T09:26:33Z

docs/public/calibration-compressor.ipynb

+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "\n",
+    "use_gpu = torch.cuda.is_available()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We will use a new recipe running GPTQ (https://arxiv.org/abs/2210.17323)\n",
+    "# to reduce error caused by quantization. GPTQ requires a calibration dataset.\n",
+    "from llmcompressor.modifiers.quantization import GPTQModifier\n",
+    "\n",
+    "# model to compress\n",
+    "model_id = \"./TinyLlama/TinyLlama-1.1B-Chat-v1.0\"\n",
+    "recipe = GPTQModifier(targets=\"Linear\", scheme=\"W4A16\", ignore=[\"lm_head\"])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datasets import load_dataset\n",
+    "\n",
+    "# Create the calibration dataset, using Huggingface datasets API\n",
+    "dataset_id = \"./ultrachat_200k\"\n",
+    "\n",
+    "# Select number of samples. 512 samples is a good place to start.\n",
+    "# Increasing the number of samples can improve accuracy.\n",
+    "num_calibration_samples = 512 if use_gpu else 4\n",
+    "max_sequence_length = 2048 if use_gpu else 16\n",
+    "\n",
+    "# Load dataset\n",
+    "ds = load_dataset(dataset_id, split=\"train_sft\")\n",
+    "# Shuffle and grab only the number of samples we need\n",
+    "ds = ds.shuffle(seed=42).select(range(num_calibration_samples))\n",
+    "\n",
+    "\n",
+    "# Preprocess and tokenize into format the model uses\n",
+    "def preprocess(example):\n",
+    "    text = tokenizer.apply_chat_template(\n",
+    "        example[\"messages\"],\n",
+    "        tokenize=False,\n",
+    "    )\n",
+    "    return tokenizer(\n",
+    "        text,\n",
+    "        padding=False,\n",
+    "        max_length=max_sequence_length,\n",
+    "        truncation=True,\n",
+    "        add_special_tokens=False,\n",
+    "    )\n",
+    "\n",
+    "\n",
+    "ds = ds.map(preprocess, remove_columns=ds.column_names)"
+   ]


⚠️ Potential issue | 🔴 Critical

Missing tokenizer definition before use.

The preprocess function references tokenizer (lines 86 and 90), but tokenizer is never defined in this notebook. In the data-free notebook, the tokenizer is loaded with AutoTokenizer.from_pretrained(). This cell will fail with a NameError.

🐛 Proposed fix - add tokenizer import and loading

Add a cell before the dataset loading cell, or modify the existing cell to include:

+"from transformers import AutoTokenizer\n", +"\n", "from datasets import load_dataset\n", "\n", "# Create the calibration dataset, using Huggingface datasets API\n", "dataset_id = \"./ultrachat_200k\"\n", +"\n", +"# Load tokenizer for preprocessing\n", +"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)\n",

🧰 Tools

🪛 Ruff (0.14.11)

38-38: Undefined name tokenizer

(F821)

51-51: Undefined name AutoModelForCausalLM

(F821)

55-55: Undefined name oneshot

(F821)

67-67: Undefined name tokenizer

(F821)

🤖 Prompt for AI Agents

In `@docs/public/calibration-compressor.ipynb` around lines 36 - 100, The notebook fails because tokenizer is used inside preprocess but never defined; add importing and loading of the model tokenizer (e.g., from transformers import AutoTokenizer) and instantiate tokenizer before the dataset cell (matching the model used for compression, e.g., AutoTokenizer.from_pretrained(model_id, use_fast=... or appropriate kwargs)); ensure the tokenizer exposes apply_chat_template (or wrap/assign a function if your tokenizer wrapper provides that) so preprocess, tokenizer.apply_chat_template and tokenizer(...) calls succeed.

coderabbitai · 2026-01-14T09:26:33Z

docs/public/calibration-compressor.ipynb

+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# oneshot modifies model in-place, so reload\n",
+    "model = AutoModelForCausalLM.from_pretrained(\n",
+    "    model_id, device_map=\"auto\", torch_dtype=\"auto\"\n",
+    ")\n",
+    "# run oneshot again, with dataset\n",
+    "model = oneshot(\n",
+    "    model=model,\n",
+    "    dataset=ds,\n",
+    "    recipe=recipe,\n",
+    "    max_seq_length=max_sequence_length,\n",
+    "    num_calibration_samples=num_calibration_samples,\n",
+    ")"
+   ]


⚠️ Potential issue | 🔴 Critical

Missing imports for AutoModelForCausalLM and oneshot.

This cell uses AutoModelForCausalLM and oneshot which are not imported in this notebook. These were imported in the data-free notebook but this notebook appears to be standalone.

🐛 Proposed fix - add missing imports

Add imports at the beginning of this cell or in a prior cell:

+"from transformers import AutoModelForCausalLM\n", +"from llmcompressor import oneshot\n", +"\n", "# oneshot modifies model in-place, so reload\n", "model = AutoModelForCausalLM.from_pretrained(\n",

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{

"cell_type": "code",

"execution_count": null,

"metadata": {},

"outputs": [],

"source": [

"# oneshot modifies model in-place, so reload\n",

"model = AutoModelForCausalLM.from_pretrained(\n",

" model_id, device_map=\"auto\", torch_dtype=\"auto\"\n",

")\n",

"# run oneshot again, with dataset\n",

"model = oneshot(\n",

" model=model,\n",

" dataset=ds,\n",

" recipe=recipe,\n",

" max_seq_length=max_sequence_length,\n",

" num_calibration_samples=num_calibration_samples,\n",

")"

]

from transformers import AutoModelForCausalLM

from llmcompressor import oneshot

# oneshot modifies model in-place, so reload

model = AutoModelForCausalLM.from_pretrained(

model_id, device_map="auto", torch_dtype="auto"

)

# run oneshot again, with dataset

model = oneshot(

model=model,

dataset=ds,

recipe=recipe,

max_seq_length=max_sequence_length,

num_calibration_samples=num_calibration_samples,

)

🤖 Prompt for AI Agents

In `@docs/public/calibration-compressor.ipynb` around lines 102 - 120, The notebook cell is missing imports for AutoModelForCausalLM and oneshot; add import statements for these symbols (e.g., from transformers import AutoModelForCausalLM and from the library that provides oneshot) in an earlier cell or at the top of this cell so AutoModelForCausalLM.from_pretrained(...) and oneshot(...) resolve correctly; ensure the import for oneshot matches the package used elsewhere in the project (the same module that defines oneshot).

coderabbitai · 2026-01-14T09:26:33Z

docs/public/calibration-compressor.ipynb

+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save model and tokenizer\n",
+    "model_dir = \"./\" + model_id.split(\"/\")[-1] + \"-GPTQ-W4A16\"\n",
+    "model.save_pretrained(model_dir)\n",
+    "tokenizer.save_pretrained(model_dir);"
+   ]


⚠️ Potential issue | 🔴 Critical

Undefined tokenizer in save cell.

Line 131 calls tokenizer.save_pretrained(model_dir) but tokenizer is still undefined due to the missing import noted earlier.

🤖 Prompt for AI Agents

In `@docs/public/calibration-compressor.ipynb` around lines 122 - 132, The notebook calls tokenizer.save_pretrained(model_dir) but tokenizer is undefined; import or instantiate the tokenizer before this save step (e.g., load/create the tokenizer tied to model_id) so that tokenizer exists when saving; ensure the tokenizer variable matches the model used (referencing tokenizer, model, model_id, and model_dir) and place the tokenizer initialization before the save_pretrained call.

coderabbitai · 2026-01-14T09:26:34Z

docs/public/calibration-compressor.ipynb

+   "source": [
+    "import os\n",
+    "os. environ [\"VLLM_USE_V1\"] = \"0\"\n",
+    "\n",
+    "import lm_eval\n",
+    "from lm_eval.utils import make_table\n",
+    "\n",
+    "from lm_eval. tasks import TaskManager\n",
+    "task_manager = TaskManager (include_path=\"./my-wikitext.yaml\")\n",
+    "\n",
+    "results = lm_eval.simple_evaluate(\n",
+    "    model=\"vllm\" if use_gpu else \"hf\",\n",
+    "    model_args={\n",
+    "        \"pretrained\": model_dir,\n",
+    "        \"add_bos_token\": True,\n",
+    "        \"device\": \"auto\",\n",
+    "        \"gpu_memory_utilization\": 0.8,\n",
+    "    },\n",
+    "    tasks=[\"my-wikitext\"],\n",
+    "    batch_size=\"auto\" if use_gpu else 4,\n",
+    "    limit=None if use_gpu else 4,\n",
+    ")"
+   ]


⚠️ Potential issue | 🔴 Critical

Same syntax errors as data-free notebook.

Lines 150, 155, and 156 contain the same extraneous spaces that will cause syntax errors:

os. environ → os.environ

lm_eval. tasks → lm_eval.tasks

TaskManager ( → TaskManager(

🐛 Proposed fix

"import os\n", -"os. environ [\"VLLM_USE_V1\"] = \"0\"\n", +"os.environ[\"VLLM_USE_V1\"] = \"0\"\n", "\n", "import lm_eval\n", "from lm_eval.utils import make_table\n", "\n", -"from lm_eval. tasks import TaskManager\n", -"task_manager = TaskManager (include_path=\"./my-wikitext.yaml\")\n", +"from lm_eval.tasks import TaskManager\n", +"task_manager = TaskManager(include_path=\"./my-wikitext.yaml\")\n",

🤖 Prompt for AI Agents

In `@docs/public/calibration-compressor.ipynb` around lines 148 - 170, The notebook has syntax errors from stray spaces in identifiers: fix the tokenization so uses os.environ (not "os. environ"), lm_eval.tasks (not "lm_eval. tasks"), and TaskManager( (not "TaskManager (") when constructing task_manager; update those occurrences in the cell to remove the extraneous spaces so the imports and the TaskManager(...) call parse correctly.

coderabbitai · 2026-01-14T09:26:34Z

docs/public/data-free-compressor.ipynb

+   "source": [
+    "import os\n",
+    "os. environ [\"VLLM_USE_V1\"] = \"0\"\n",
+    "\n",
+    "import lm_eval\n",
+    "from lm_eval.utils import make_table\n",
+    "\n",
+    "from lm_eval. tasks import TaskManager\n",
+    "task_manager = TaskManager (include_path=\"./my-wikitext.yaml\")\n",
+    "\n",
+    "results = lm_eval.simple_evaluate(\n",
+    "    model=\"vllm\" if use_gpu else \"hf\",\n",
+    "    model_args={\n",
+    "        \"pretrained\": model_dir,\n",
+    "        \"add_bos_token\": True,\n",
+    "        \"device\": \"auto\",\n",
+    "        \"gpu_memory_utilization\": 0.8,\n",
+    "    },\n",
+    "    tasks=[\"my-wikitext\"],\n",
+    "    batch_size=\"auto\" if use_gpu else 4,\n",
+    "    limit=None if use_gpu else 4,\n",
+    ")"
+   ]


⚠️ Potential issue | 🔴 Critical

Fix syntax errors with extraneous spaces.

Lines 111, 116, and 117 contain spaces within identifiers and function calls that will cause syntax errors:

os. environ should be os.environ

lm_eval. tasks should be lm_eval.tasks

TaskManager ( should be TaskManager(

Also, task_manager is instantiated but not explicitly passed to simple_evaluate. If the side-effect of registering the task path is intended, consider adding a brief comment to clarify this.

🐛 Proposed fix

"import os\n", -"os. environ [\"VLLM_USE_V1\"] = \"0\"\n", +"os.environ[\"VLLM_USE_V1\"] = \"0\"\n", "\n", "import lm_eval\n", "from lm_eval.utils import make_table\n", "\n", -"from lm_eval. tasks import TaskManager\n", -"task_manager = TaskManager (include_path=\"./my-wikitext.yaml\")\n", +"from lm_eval.tasks import TaskManager\n", +"# Initialize TaskManager to register custom task path\n", +"task_manager = TaskManager(include_path=\"./my-wikitext.yaml\")\n",

🤖 Prompt for AI Agents

In `@docs/public/data-free-compressor.ipynb` around lines 109 - 131, There are syntax errors from stray spaces in identifiers and a potential missed usage of the TaskManager: remove the spaces so use os.environ (not os. environ), lm_eval.tasks (not lm_eval. tasks), and call TaskManager(...) (not TaskManager ( ... )), then either pass the created task_manager into lm_eval.simple_evaluate (or add a clear comment that instantiating task_manager is relied upon for side-effect registration) to make intent explicit; update any related variable names (task_manager, my-wikitext) accordingly.

fyuan1316 · 2026-01-14T09:59:23Z

docs/en/llm-compressor/how_to/evaluate_model.mdx

+In JupyterLab, open the **Launcher** page, select the **Terminal** tile, and run the following commands to install the required dependencies:
+
+```bash
+/.venv/bin/python -m pip install vllm==0.8.5 -i https://pypi.tuna.tsinghua.edu.cn/simple && #[!code callout]


这个改成我们发版支持的版本？

chore: change to training AI-23523

92acc54

feat: llm compressor AI-23582

a26a899

EdisonSu768 changed the title ~~chore: change to training AI-23523~~ feat: llm compressor AI-23582 Jan 14, 2026

EdisonSu768 marked this pull request as ready for review January 14, 2026 09:21

EdisonSu768 self-assigned this Jan 14, 2026

EdisonSu768 requested review from davidwtf, fyuan1316 and zhaomingkun1030 January 14, 2026 09:21

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

fyuan1316 reviewed Jan 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: llm compressor AI-23582 #79

feat: llm compressor AI-23582 #79

EdisonSu768 commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

coderabbitai bot Jan 14, 2026

Uh oh!

fyuan1316 Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		### Deploy and Use the Compressed Model for Inference

		After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.html#create-inference-service) to complete this step.

feat: llm compressor AI-23582 #79

Are you sure you want to change the base?

feat: llm compressor AI-23582 #79

Conversation

EdisonSu768 commented Jan 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

cloudflare-workers-and-pages bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying alauda-ai with Cloudflare Pages

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

fyuan1316 Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EdisonSu768 commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 13, 2026 •

edited

Loading

cloudflare-workers-and-pages bot commented Jan 13, 2026 •

edited

Loading