Skip to content

Conversation

@EdisonSu768
Copy link
Member

@EdisonSu768 EdisonSu768 commented Jan 13, 2026

Summary by CodeRabbit

  • Documentation
    • Updated navigation label for model optimization section.
    • Added comprehensive LLM Compressor documentation including introduction to compression techniques (weight quantization and pruning).
    • Added integration guide for LLM Compressor with Alauda AI platform.
    • Added manual evaluation guidelines.
    • Added example notebooks demonstrating data-free and calibration-based compression workflows.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 13, 2026

Walkthrough

Navigation label updated from "PreTraining" to "Training" in tools documentation. Comprehensive LLM Compressor documentation added, including introduction, index, how-to guides covering Alauda AI integration and model evaluation workflows. Two example Jupyter notebooks demonstrate data-free and calibration-based compression techniques using llm-compressor.

Changes

Cohort / File(s) Summary
Navigation Update
docs/en/installation/tools.mdx
Updated nav_pre_train label text from "PreTraining" to "Training" (bilingual: 预训练 → 训练)
LLM Compressor Documentation Root
docs/en/llm-compressor/intro.mdx, docs/en/llm-compressor/index.mdx
Added introduction describing LLM Compressor, its integration with Hugging Face and vLLM, and compression techniques (W4A16, W8A8, pruning); added root index page with Overview component
LLM Compressor How-To Guides
docs/en/llm-compressor/how_to/index.mdx, docs/en/llm-compressor/how_to/compressor_by_workbench.mdx, docs/en/llm-compressor/how_to/evaluate_model.mdx
Added how-to index page; documented Alauda AI platform integration with workbench-based workflows; documented manual evaluation steps including prerequisites, custom task creation, lm_eval patching, and evaluation configuration
Example Jupyter Notebooks
docs/public/data-free-compressor.ipynb, docs/public/calibration-compressor.ipynb
Added data-free compression example with W4A16 quantization recipe; added GPTQ calibration-based compression example with calibration dataset preparation and evaluation; both notebooks include lm_eval integration for performance assessment

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

  • zhaomingkun1030
  • typhoonzero

Poem

📚 Hop along through docs so bright,
Compression guides to make things light,
From intro page to notebooks two,
LLM Compressor waits for you!
🐰 —CodeRabbit

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'feat: llm compressor AI-23582' is vague and does not clearly communicate the main change; it lacks specificity about what was added or modified in the documentation. Consider a more descriptive title such as 'docs: Add LLM Compressor documentation and example notebooks' to clearly convey that documentation files and examples are being added.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 13, 2026

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: a26a899
Status: ✅  Deploy successful!
Preview URL: https://dc144ea6.alauda-ai.pages.dev
Branch Preview URL: https://feat-llm-compressor.alauda-ai.pages.dev

View logs

@EdisonSu768 EdisonSu768 changed the title chore: change to training AI-23523 feat: llm compressor AI-23582 Jan 14, 2026
@EdisonSu768 EdisonSu768 marked this pull request as ready for review January 14, 2026 09:21
@EdisonSu768 EdisonSu768 self-assigned this Jan 14, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Fix all issues with AI agents
In `@docs/en/llm-compressor/how_to/compressor_by_workbench.mdx`:
- Line 50: In docs/en/llm-compressor/how_to/compressor_by_workbench.mdx update
the link target that currently points to
../../model_inference/inference_service/functions/inference_service.html#create-inference-service
so it uses the correct .mdx extension
(../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service);
this fixes the broken link when creating a new inference service after uploading
the compressed model.

In `@docs/en/llm-compressor/how_to/evaluate_model.mdx`:
- Around line 31-46: The docs currently hardcode a Python version in the
site-packages path; update the instructions to avoid a specific Python version
by telling users to edit the lm_eval/tasks/__init__.py file inside their active
virtualenv's site-packages directory (do not hardcode
~/.venv/lib/python3.11/...), and instruct them to locate the block that computes
relative_yaml_path (the try/except around
yaml_path.relative_to(lm_eval_tasks_path)) and apply the suggested
ValueError-handling change there; also add a short note directing users to use
their Python/virtualenv tools to discover their site-packages location rather
than assuming a path.
- Around line 54-73: The YAML example uses an incorrect function reference
syntax: replace the hyphenated `!function preprocess_wikitext-process_results`
with the proper import-style dotted path `!function
preprocess_wikitext.process_results` (matching the other `!function
preprocess_wikitext.wikitext_detokenizer` usage) so the `!function` directive
points to the module attribute correctly.

In `@docs/public/calibration-compressor.ipynb`:
- Around line 122-132: The notebook calls tokenizer.save_pretrained(model_dir)
but tokenizer is undefined; import or instantiate the tokenizer before this save
step (e.g., load/create the tokenizer tied to model_id) so that tokenizer exists
when saving; ensure the tokenizer variable matches the model used (referencing
tokenizer, model, model_id, and model_dir) and place the tokenizer
initialization before the save_pretrained call.
- Around line 148-170: The notebook has syntax errors from stray spaces in
identifiers: fix the tokenization so uses os.environ (not "os. environ"),
lm_eval.tasks (not "lm_eval. tasks"), and TaskManager( (not "TaskManager (")
when constructing task_manager; update those occurrences in the cell to remove
the extraneous spaces so the imports and the TaskManager(...) call parse
correctly.
- Around line 102-120: The notebook cell is missing imports for
AutoModelForCausalLM and oneshot; add import statements for these symbols (e.g.,
from transformers import AutoModelForCausalLM and from the library that provides
oneshot) in an earlier cell or at the top of this cell so
AutoModelForCausalLM.from_pretrained(...) and oneshot(...) resolve correctly;
ensure the import for oneshot matches the package used elsewhere in the project
(the same module that defines oneshot).
- Around line 36-100: The notebook fails because tokenizer is used inside
preprocess but never defined; add importing and loading of the model tokenizer
(e.g., from transformers import AutoTokenizer) and instantiate tokenizer before
the dataset cell (matching the model used for compression, e.g.,
AutoTokenizer.from_pretrained(model_id, use_fast=... or appropriate kwargs));
ensure the tokenizer exposes apply_chat_template (or wrap/assign a function if
your tokenizer wrapper provides that) so preprocess,
tokenizer.apply_chat_template and tokenizer(...) calls succeed.

In `@docs/public/data-free-compressor.ipynb`:
- Around line 109-131: There are syntax errors from stray spaces in identifiers
and a potential missed usage of the TaskManager: remove the spaces so use
os.environ (not os. environ), lm_eval.tasks (not lm_eval. tasks), and call
TaskManager(...) (not TaskManager ( ... )), then either pass the created
task_manager into lm_eval.simple_evaluate (or add a clear comment that
instantiating task_manager is relied upon for side-effect registration) to make
intent explicit; update any related variable names (task_manager, my-wikitext)
accordingly.
🧹 Nitpick comments (2)
docs/en/installation/tools.mdx (1)

138-140: Consider updating the i18nKey for consistency.

The i18nKey remains "nav_pre_train" while the display text has been updated to "Training". While i18n keys don't technically need to match the display text, maintaining consistency between the key name and its meaning improves maintainability.

Since this is example documentation showing what the merged ConfigMap looks like, verify whether the actual system configuration also needs this i18nKey updated to something like "nav_training".

docs/en/llm-compressor/intro.mdx (1)

14-14: Minor: Use hyphenated "floating-point" as compound adjective.

When used as a compound adjective before a noun, "floating-point" should be hyphenated.

📝 Suggested fix
-- Weight and activation quantization (W8A8) compresses both weights and activations to 8-bit precision, targeting general server scenarios for integer and floating point formats.
+- Weight and activation quantization (W8A8) compresses both weights and activations to 8-bit precision, targeting general server scenarios for integer and floating-point formats.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6090ae2 and a26a899.

📒 Files selected for processing (8)
  • docs/en/installation/tools.mdx
  • docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
  • docs/en/llm-compressor/how_to/evaluate_model.mdx
  • docs/en/llm-compressor/how_to/index.mdx
  • docs/en/llm-compressor/index.mdx
  • docs/en/llm-compressor/intro.mdx
  • docs/public/calibration-compressor.ipynb
  • docs/public/data-free-compressor.ipynb
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.

Applied to files:

  • docs/en/llm-compressor/how_to/index.mdx
  • docs/en/installation/tools.mdx
  • docs/en/llm-compressor/how_to/evaluate_model.mdx
  • docs/en/llm-compressor/index.mdx
  • docs/en/llm-compressor/intro.mdx
  • docs/en/llm-compressor/how_to/compressor_by_workbench.mdx
🪛 LanguageTool
docs/en/llm-compressor/how_to/evaluate_model.mdx

[style] ~50-~50: Consider using “inaccessible” to avoid wordiness.
Context: ...m Hugging Face. Because Hugging Face is not accessible from mainland China, you must define a ...

(NOT_ABLE_PREMIUM)

docs/en/llm-compressor/intro.mdx

[grammar] ~14-~14: Use a hyphen to join words.
Context: ...erver scenarios for integer and floating point formats. - Weight pruning, also kn...

(QB_NEW_EN_HYPHEN)


[style] ~15-~15: ‘in conjunction with’ might be wordy. Consider a shorter alternative.
Context: ...is requires fine-tuning, it can be used in conjunction with quantization for further inference acce...

(EN_WORDINESS_PREMIUM_IN_CONJUNCTION_WITH)

🪛 Ruff (0.14.11)
docs/public/calibration-compressor.ipynb

34-34: Undefined name tokenizer

(F821)


38-38: Undefined name tokenizer

(F821)


51-51: Undefined name AutoModelForCausalLM

(F821)


55-55: Undefined name oneshot

(F821)


67-67: Undefined name tokenizer

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (10)
docs/en/llm-compressor/index.mdx (1)

1-7: LGTM!

Standard index page structure with appropriate frontmatter weight and the <Overview /> component placeholder.

docs/en/llm-compressor/how_to/index.mdx (1)

1-7: LGTM!

Standard how-to section index page following the established documentation pattern.

docs/en/llm-compressor/intro.mdx (1)

7-15: LGTM! Clear introduction to LLM Compressor.

The content provides a good overview of the framework, its integration with Hugging Face and vLLM, and the supported compression techniques. The external links to the GitHub repository and vLLM documentation are helpful references.

docs/en/llm-compressor/how_to/compressor_by_workbench.mdx (2)

1-50: LGTM! Comprehensive workflow documentation.

The guide provides clear, well-structured instructions for using LLM Compressor with the Alauda AI platform. The workflow steps are logically organized, and the conditional sections (optional dataset preparation for data-free vs calibration workflows) are appropriately marked.


9-10: The notebook paths are correctly configured. Both referenced files exist at docs/public/calibration-compressor.ipynb and docs/public/data-free-compressor.ipynb. The absolute paths (/data-free-compressor.ipynb, /calibration-compressor.ipynb) are the correct format for referencing files in the public directory and will resolve properly when the documentation is built.

docs/public/data-free-compressor.ipynb (3)

1-20: LGTM!

Clear introduction with appropriate notes about GPU requirements and lm_eval version dependency.


28-54: LGTM!

GPU detection and quantization recipe setup are well-implemented. The W4A16 scheme with lm_head exclusion is a sensible default configuration.


56-93: LGTM!

Model loading, compression, and saving workflow is correct. Using device_map="auto" and torch_dtype="auto" provides good flexibility.

docs/public/calibration-compressor.ipynb (1)

1-35: LGTM!

Clear introduction explaining the calibration-based workflow and GPTQ algorithm usage.

docs/en/llm-compressor/how_to/evaluate_model.mdx (1)

79-79: The notebook links are correct. The notebooks exist at docs/public/calibration-compressor.ipynb and docs/public/data-free-compressor.ipynb, and the absolute paths /data-free-compressor.ipynb and /calibration-compressor.ipynb follow the standard documentation site pattern where the public assets directory is served from the root. These links are used consistently across multiple documentation files.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


### Deploy and Use the Compressed Model for Inference

After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.html#create-inference-service) to complete this step.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all inference_service files in the repository
find . -type f \( -name "*inference_service*" \) | head -20

Repository: alauda/aml-docs

Length of output: 223


🏁 Script executed:

#!/bin/bash
# List the documentation structure around model_inference
git ls-files | grep -E "model_inference.*inference_service" | head -20

Repository: alauda/aml-docs

Length of output: 909


🏁 Script executed:

#!/bin/bash
# Check the actual path mentioned in the link
ls -la ./docs/en/model_inference/inference_service/functions/ 2>/dev/null || echo "Path with functions subdirectory not found"

Repository: alauda/aml-docs

Length of output: 304


🏁 Script executed:

#!/bin/bash
# Check alternative paths
ls -la ./docs/en/model_inference/inference_service/ 2>/dev/null || echo "Path without functions subdirectory not found"

Repository: alauda/aml-docs

Length of output: 461


Link extension is incorrect: should be .mdx not .html.

This link uses .html but the actual file is docs/en/model_inference/inference_service/functions/inference_service.mdx. Change the extension to .mdx to avoid broken links.

📝 Suggested fix
-After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.html#create-inference-service) to complete this step.
+After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service) to complete this step.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.html#create-inference-service) to complete this step.
After uploading the compressed model, create a new inference service to deploy and use it. Follow the instructions in [create inference service](../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service) to complete this step.
🤖 Prompt for AI Agents
In `@docs/en/llm-compressor/how_to/compressor_by_workbench.mdx` at line 50, In
docs/en/llm-compressor/how_to/compressor_by_workbench.mdx update the link target
that currently points to
../../model_inference/inference_service/functions/inference_service.html#create-inference-service
so it uses the correct .mdx extension
(../../model_inference/inference_service/functions/inference_service.mdx#create-inference-service);
this fixes the broken link when creating a new inference service after uploading
the compressed model.

Comment on lines +31 to +46
:::note
As of the latest release, the `lm_eval` library does not natively support custom evaluation tasks. To enable this capability, you must manually apply a small patch to the `lm_eval` source code.

Edit the following file:

`~/.venv/lib/python3.11/site-packages/lm_eval/tasks/__init__.py`

Locate approximately line 683 and update the code as shown below. For additional context, see this upstream pull request: [PR #3436](https://github.com/EleutherAI/lm-evaluation-harness/pull/3436/files).

```python
try:
relative_yaml_path = yaml_path.relative_to(lm_eval_tasks_path)
except ValueError:
relative_yaml_path = yaml_path
```
:::
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded Python version may cause confusion.

The path ~/.venv/lib/python3.11/site-packages/... assumes Python 3.11. Users with different Python versions will have a different path. Consider making this more generic.

📝 Suggested improvement
 Edit the following file:

-`~/.venv/lib/python3.11/site-packages/lm_eval/tasks/__init__.py`
+`~/.venv/lib/python<VERSION>/site-packages/lm_eval/tasks/__init__.py`
+
+Replace `<VERSION>` with your installed Python version (e.g., `3.10`, `3.11`).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
:::note
As of the latest release, the `lm_eval` library does not natively support custom evaluation tasks. To enable this capability, you must manually apply a small patch to the `lm_eval` source code.
Edit the following file:
`~/.venv/lib/python3.11/site-packages/lm_eval/tasks/__init__.py`
Locate approximately line 683 and update the code as shown below. For additional context, see this upstream pull request: [PR #3436](https://github.com/EleutherAI/lm-evaluation-harness/pull/3436/files).
```python
try:
relative_yaml_path = yaml_path.relative_to(lm_eval_tasks_path)
except ValueError:
relative_yaml_path = yaml_path
```
:::
:::note
As of the latest release, the `lm_eval` library does not natively support custom evaluation tasks. To enable this capability, you must manually apply a small patch to the `lm_eval` source code.
Edit the following file:
`~/.venv/lib/python<VERSION>/site-packages/lm_eval/tasks/__init__.py`
Replace `<VERSION>` with your installed Python version (e.g., `3.10`, `3.11`).
Locate approximately line 683 and update the code as shown below. For additional context, see this upstream pull request: [PR `#3436`](https://github.com/EleutherAI/lm-evaluation-harness/pull/3436/files).
🤖 Prompt for AI Agents
In `@docs/en/llm-compressor/how_to/evaluate_model.mdx` around lines 31 - 46, The
docs currently hardcode a Python version in the site-packages path; update the
instructions to avoid a specific Python version by telling users to edit the
lm_eval/tasks/__init__.py file inside their active virtualenv's site-packages
directory (do not hardcode ~/.venv/lib/python3.11/...), and instruct them to
locate the block that computes relative_yaml_path (the try/except around
yaml_path.relative_to(lm_eval_tasks_path)) and apply the suggested
ValueError-handling change there; also add a short note directing users to use
their Python/virtualenv tools to discover their site-packages location rather
than assuming a path.

Comment on lines +54 to +73
```yaml
task: my-wikitext
dataset_path: /home/jovyan/wikitext_document_level #[!code callout]
dataset_name: wikitext-2-raw-v1
output_type: loglikelihood_rolling
training_split: train
validation_split: validation
test_split: test
doc_to_text: ''''
doc_to_target: !function preprocess_wikitext.wikitext_detokenizer
process_results: !function preprocess_wikitext-process_results
should_decontaminate: true
doc_to_decontamination_query: "{{page}}"
metric_list:
- metric: word_perplexity
- metric: byte_perplexity
- metric: bits_per_byte
metadata:
version: 1.0
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the specific file mentioned
find . -name "evaluate_model.mdx" -type f

Repository: alauda/aml-docs

Length of output: 109


🏁 Script executed:

# Read the specific file to verify the actual content at lines 54-73
cat -n ./docs/en/llm-compressor/how_to/evaluate_model.mdx | sed -n '50,75p'

Repository: alauda/aml-docs

Length of output: 1216


🏁 Script executed:

# Search for the preprocess_wikitext module or file
find . -name "*preprocess_wikitext*" -o -name "*wikitext*" | grep -E "\.(py|yaml|yml|ts|js)$" | head -20

Repository: alauda/aml-docs

Length of output: 41


🏁 Script executed:

# Search for any references to preprocess_wikitext or related content
rg "preprocess_wikitext" --type md --type mdx --type yaml

Repository: alauda/aml-docs

Length of output: 86


🏁 Script executed:

# Search for preprocess_wikitext references without file type filter
rg "preprocess_wikitext"

Repository: alauda/aml-docs

Length of output: 285


🏁 Script executed:

# Check broader context around the YAML snippet
cat -n ./docs/en/llm-compressor/how_to/evaluate_model.mdx | sed -n '40,90p'

Repository: alauda/aml-docs

Length of output: 2107


🌐 Web query:

lm_eval YAML !function directive syntax module path naming

💡 Result:

Short answer: use the YAML tag !function followed by a Python import-style path to the class or callable (module and attribute separated by a dot), e.g.

  • class: !function task.SQuAD2

Details / rules:

  • The string is treated as a Python import path: . and lm-eval will import the module and fetch that attribute (a Task class or callable). [1][2]
  • The module must be importable from the harness run context — either a module inside lm_eval (e.g. task.*) or in a directory you add with --include_path / by placing YAML/python files under lm_eval/tasks. [2][3]
  • You can pass additional config/arguments via fields like recipe (parsed/forwarded to the class constructor / ConfigurableTask). [2]
  • Examples in the docs show group/task entries using class: !function task.YourClass and using recipe to pass custom args. [2][1]

Sources: lm-eval new task guide / docs and examples. [1][2][3]

References
[1] New Task Guide — lm-evaluation-harness (YAML !function examples).
[2] lm-eval docs / examples showing class: !function task.SQuAD2 and group/task usage.
[3] lm-eval task configuration / include_path notes.


Fix incorrect function reference syntax in YAML example.

Line 64 uses preprocess_wikitext-process_results (with hyphen), which is incorrect. The !function directive in lm_eval requires Python import-style paths with dots separating module and attributes. This should be:

Diff
-process_results: !function preprocess_wikitext-process_results
+process_results: !function preprocess_wikitext.process_results
🤖 Prompt for AI Agents
In `@docs/en/llm-compressor/how_to/evaluate_model.mdx` around lines 54 - 73, The
YAML example uses an incorrect function reference syntax: replace the hyphenated
`!function preprocess_wikitext-process_results` with the proper import-style
dotted path `!function preprocess_wikitext.process_results` (matching the other
`!function preprocess_wikitext.wikitext_detokenizer` usage) so the `!function`
directive points to the module attribute correctly.

Comment on lines +36 to +100
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"\n",
"use_gpu = torch.cuda.is_available()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# We will use a new recipe running GPTQ (https://arxiv.org/abs/2210.17323)\n",
"# to reduce error caused by quantization. GPTQ requires a calibration dataset.\n",
"from llmcompressor.modifiers.quantization import GPTQModifier\n",
"\n",
"# model to compress\n",
"model_id = \"./TinyLlama/TinyLlama-1.1B-Chat-v1.0\"\n",
"recipe = GPTQModifier(targets=\"Linear\", scheme=\"W4A16\", ignore=[\"lm_head\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
"# Create the calibration dataset, using Huggingface datasets API\n",
"dataset_id = \"./ultrachat_200k\"\n",
"\n",
"# Select number of samples. 512 samples is a good place to start.\n",
"# Increasing the number of samples can improve accuracy.\n",
"num_calibration_samples = 512 if use_gpu else 4\n",
"max_sequence_length = 2048 if use_gpu else 16\n",
"\n",
"# Load dataset\n",
"ds = load_dataset(dataset_id, split=\"train_sft\")\n",
"# Shuffle and grab only the number of samples we need\n",
"ds = ds.shuffle(seed=42).select(range(num_calibration_samples))\n",
"\n",
"\n",
"# Preprocess and tokenize into format the model uses\n",
"def preprocess(example):\n",
" text = tokenizer.apply_chat_template(\n",
" example[\"messages\"],\n",
" tokenize=False,\n",
" )\n",
" return tokenizer(\n",
" text,\n",
" padding=False,\n",
" max_length=max_sequence_length,\n",
" truncation=True,\n",
" add_special_tokens=False,\n",
" )\n",
"\n",
"\n",
"ds = ds.map(preprocess, remove_columns=ds.column_names)"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing tokenizer definition before use.

The preprocess function references tokenizer (lines 86 and 90), but tokenizer is never defined in this notebook. In the data-free notebook, the tokenizer is loaded with AutoTokenizer.from_pretrained(). This cell will fail with a NameError.

🐛 Proposed fix - add tokenizer import and loading

Add a cell before the dataset loading cell, or modify the existing cell to include:

+"from transformers import AutoTokenizer\n",
+"\n",
 "from datasets import load_dataset\n",
 "\n",
 "# Create the calibration dataset, using Huggingface datasets API\n",
 "dataset_id = \"./ultrachat_200k\"\n",
+"\n",
+"# Load tokenizer for preprocessing\n",
+"tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)\n",
🧰 Tools
🪛 Ruff (0.14.11)

38-38: Undefined name tokenizer

(F821)


51-51: Undefined name AutoModelForCausalLM

(F821)


55-55: Undefined name oneshot

(F821)


67-67: Undefined name tokenizer

(F821)

🤖 Prompt for AI Agents
In `@docs/public/calibration-compressor.ipynb` around lines 36 - 100, The notebook
fails because tokenizer is used inside preprocess but never defined; add
importing and loading of the model tokenizer (e.g., from transformers import
AutoTokenizer) and instantiate tokenizer before the dataset cell (matching the
model used for compression, e.g., AutoTokenizer.from_pretrained(model_id,
use_fast=... or appropriate kwargs)); ensure the tokenizer exposes
apply_chat_template (or wrap/assign a function if your tokenizer wrapper
provides that) so preprocess, tokenizer.apply_chat_template and tokenizer(...)
calls succeed.

Comment on lines +102 to +120
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# oneshot modifies model in-place, so reload\n",
"model = AutoModelForCausalLM.from_pretrained(\n",
" model_id, device_map=\"auto\", torch_dtype=\"auto\"\n",
")\n",
"# run oneshot again, with dataset\n",
"model = oneshot(\n",
" model=model,\n",
" dataset=ds,\n",
" recipe=recipe,\n",
" max_seq_length=max_sequence_length,\n",
" num_calibration_samples=num_calibration_samples,\n",
")"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing imports for AutoModelForCausalLM and oneshot.

This cell uses AutoModelForCausalLM and oneshot which are not imported in this notebook. These were imported in the data-free notebook but this notebook appears to be standalone.

🐛 Proposed fix - add missing imports

Add imports at the beginning of this cell or in a prior cell:

+"from transformers import AutoModelForCausalLM\n",
+"from llmcompressor import oneshot\n",
+"\n",
 "# oneshot modifies model in-place, so reload\n",
 "model = AutoModelForCausalLM.from_pretrained(\n",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# oneshot modifies model in-place, so reload\n",
"model = AutoModelForCausalLM.from_pretrained(\n",
" model_id, device_map=\"auto\", torch_dtype=\"auto\"\n",
")\n",
"# run oneshot again, with dataset\n",
"model = oneshot(\n",
" model=model,\n",
" dataset=ds,\n",
" recipe=recipe,\n",
" max_seq_length=max_sequence_length,\n",
" num_calibration_samples=num_calibration_samples,\n",
")"
]
from transformers import AutoModelForCausalLM
from llmcompressor import oneshot
# oneshot modifies model in-place, so reload
model = AutoModelForCausalLM.from_pretrained(
model_id, device_map="auto", torch_dtype="auto"
)
# run oneshot again, with dataset
model = oneshot(
model=model,
dataset=ds,
recipe=recipe,
max_seq_length=max_sequence_length,
num_calibration_samples=num_calibration_samples,
)
🤖 Prompt for AI Agents
In `@docs/public/calibration-compressor.ipynb` around lines 102 - 120, The
notebook cell is missing imports for AutoModelForCausalLM and oneshot; add
import statements for these symbols (e.g., from transformers import
AutoModelForCausalLM and from the library that provides oneshot) in an earlier
cell or at the top of this cell so AutoModelForCausalLM.from_pretrained(...) and
oneshot(...) resolve correctly; ensure the import for oneshot matches the
package used elsewhere in the project (the same module that defines oneshot).

Comment on lines +122 to +132
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Save model and tokenizer\n",
"model_dir = \"./\" + model_id.split(\"/\")[-1] + \"-GPTQ-W4A16\"\n",
"model.save_pretrained(model_dir)\n",
"tokenizer.save_pretrained(model_dir);"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Undefined tokenizer in save cell.

Line 131 calls tokenizer.save_pretrained(model_dir) but tokenizer is still undefined due to the missing import noted earlier.

🤖 Prompt for AI Agents
In `@docs/public/calibration-compressor.ipynb` around lines 122 - 132, The
notebook calls tokenizer.save_pretrained(model_dir) but tokenizer is undefined;
import or instantiate the tokenizer before this save step (e.g., load/create the
tokenizer tied to model_id) so that tokenizer exists when saving; ensure the
tokenizer variable matches the model used (referencing tokenizer, model,
model_id, and model_dir) and place the tokenizer initialization before the
save_pretrained call.

Comment on lines +148 to +170
"source": [
"import os\n",
"os. environ [\"VLLM_USE_V1\"] = \"0\"\n",
"\n",
"import lm_eval\n",
"from lm_eval.utils import make_table\n",
"\n",
"from lm_eval. tasks import TaskManager\n",
"task_manager = TaskManager (include_path=\"./my-wikitext.yaml\")\n",
"\n",
"results = lm_eval.simple_evaluate(\n",
" model=\"vllm\" if use_gpu else \"hf\",\n",
" model_args={\n",
" \"pretrained\": model_dir,\n",
" \"add_bos_token\": True,\n",
" \"device\": \"auto\",\n",
" \"gpu_memory_utilization\": 0.8,\n",
" },\n",
" tasks=[\"my-wikitext\"],\n",
" batch_size=\"auto\" if use_gpu else 4,\n",
" limit=None if use_gpu else 4,\n",
")"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same syntax errors as data-free notebook.

Lines 150, 155, and 156 contain the same extraneous spaces that will cause syntax errors:

  • os. environos.environ
  • lm_eval. taskslm_eval.tasks
  • TaskManager (TaskManager(
🐛 Proposed fix
 "import os\n",
-"os. environ [\"VLLM_USE_V1\"] = \"0\"\n",
+"os.environ[\"VLLM_USE_V1\"] = \"0\"\n",
 "\n",
 "import lm_eval\n",
 "from lm_eval.utils import make_table\n",
 "\n",
-"from lm_eval. tasks import TaskManager\n",
-"task_manager = TaskManager (include_path=\"./my-wikitext.yaml\")\n",
+"from lm_eval.tasks import TaskManager\n",
+"task_manager = TaskManager(include_path=\"./my-wikitext.yaml\")\n",
🤖 Prompt for AI Agents
In `@docs/public/calibration-compressor.ipynb` around lines 148 - 170, The
notebook has syntax errors from stray spaces in identifiers: fix the
tokenization so uses os.environ (not "os. environ"), lm_eval.tasks (not
"lm_eval. tasks"), and TaskManager( (not "TaskManager (") when constructing
task_manager; update those occurrences in the cell to remove the extraneous
spaces so the imports and the TaskManager(...) call parse correctly.

Comment on lines +109 to +131
"source": [
"import os\n",
"os. environ [\"VLLM_USE_V1\"] = \"0\"\n",
"\n",
"import lm_eval\n",
"from lm_eval.utils import make_table\n",
"\n",
"from lm_eval. tasks import TaskManager\n",
"task_manager = TaskManager (include_path=\"./my-wikitext.yaml\")\n",
"\n",
"results = lm_eval.simple_evaluate(\n",
" model=\"vllm\" if use_gpu else \"hf\",\n",
" model_args={\n",
" \"pretrained\": model_dir,\n",
" \"add_bos_token\": True,\n",
" \"device\": \"auto\",\n",
" \"gpu_memory_utilization\": 0.8,\n",
" },\n",
" tasks=[\"my-wikitext\"],\n",
" batch_size=\"auto\" if use_gpu else 4,\n",
" limit=None if use_gpu else 4,\n",
")"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix syntax errors with extraneous spaces.

Lines 111, 116, and 117 contain spaces within identifiers and function calls that will cause syntax errors:

  • os. environ should be os.environ
  • lm_eval. tasks should be lm_eval.tasks
  • TaskManager ( should be TaskManager(

Also, task_manager is instantiated but not explicitly passed to simple_evaluate. If the side-effect of registering the task path is intended, consider adding a brief comment to clarify this.

🐛 Proposed fix
 "import os\n",
-"os. environ [\"VLLM_USE_V1\"] = \"0\"\n",
+"os.environ[\"VLLM_USE_V1\"] = \"0\"\n",
 "\n",
 "import lm_eval\n",
 "from lm_eval.utils import make_table\n",
 "\n",
-"from lm_eval. tasks import TaskManager\n",
-"task_manager = TaskManager (include_path=\"./my-wikitext.yaml\")\n",
+"from lm_eval.tasks import TaskManager\n",
+"# Initialize TaskManager to register custom task path\n",
+"task_manager = TaskManager(include_path=\"./my-wikitext.yaml\")\n",
🤖 Prompt for AI Agents
In `@docs/public/data-free-compressor.ipynb` around lines 109 - 131, There are
syntax errors from stray spaces in identifiers and a potential missed usage of
the TaskManager: remove the spaces so use os.environ (not os. environ),
lm_eval.tasks (not lm_eval. tasks), and call TaskManager(...) (not TaskManager (
... )), then either pass the created task_manager into lm_eval.simple_evaluate
(or add a clear comment that instantiating task_manager is relied upon for
side-effect registration) to make intent explicit; update any related variable
names (task_manager, my-wikitext) accordingly.

In JupyterLab, open the **Launcher** page, select the **Terminal** tile, and run the following commands to install the required dependencies:

```bash
/.venv/bin/python -m pip install vllm==0.8.5 -i https://pypi.tuna.tsinghua.edu.cn/simple && #[!code callout]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个改成我们发版支持的版本?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants