Cz/bitflip finetune by ChengZhang-98 · Pull Request #30 · AICrossSim/NewComputeBench

ChengZhang-98 · 2026-02-28T19:57:45Z

This pull request introduces comprehensive support for bitflip-aware LoRA fine-tuning for large language models (LLMs), specifically Llama-3.1-8B, and documents the workflow and results. The changes include new training and evaluation scripts, configuration files for bitflip and LoRA parameters, a step-by-step tutorial, and updates to project documentation with milestone results. The workflow enables injecting random bitflip noise during training and evaluation, allowing LoRA adapters to learn robustness to hardware-level errors, with clear evidence that LoRA fine-tuning dramatically reduces the negative impact of bitflip noise.

Bitflip-aware LoRA fine-tuning workflow and documentation:

Added a detailed tutorial (clm-bitflip-lora-finetune.md) explaining the theory, configuration, step-by-step usage, and results of bitflip-aware LoRA fine-tuning for Llama-3.1-8B, including training curves and baseline comparisons.
Introduced parameterized shell scripts for fine-tuning (fine-tune-bitflip-clm.sh) and evaluation-only runs (eval-bitflip-no-finetune.sh, eval-no-biflip-no-finetune.sh), with automatic calculation of training steps based on model size and batch configuration. [1] [2] [3]

Configuration and reproducibility:

Added TOML configuration files for bitflip-aware (transform_cfg.toml) and baseline LoRA (transform_cfg_baseline.toml) setups, specifying bitflip probabilities and LoRA parameters. [1] [2]

Results visualization and analysis:

Added a Python script (plot_train_loss.py) to visualize training loss curves from W&B CSV exports, highlighting the effectiveness of bitflip-aware LoRA fine-tuning.

Documentation and milestone updates:

Updated the main documentation and README to reflect the new milestone: successful bitflip-aware LoRA fine-tuning of Llama-3.1-8B, with a summary table and links to the tutorial and results (including perplexity reduction from 1008.95 to 11.01). [1] [2]

ChengZhang-98 added 8 commits February 28, 2026 13:51

refactor

fb6e2cf

bitflip lora fine-tune experiments

42b0702

docs

6a70169

wip

44b8825

baseline scripts

7ae68b6

Merge branch 'master' into cz/bitflip-finetune

2056f4e

fix docs

84b4309

fix imports

b459158

ChengZhang-98 merged commit 940ada7 into master Mar 2, 2026

ChengZhang-98 deleted the cz/bitflip-finetune branch March 2, 2026 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cz/bitflip finetune#30

Cz/bitflip finetune#30
ChengZhang-98 merged 8 commits intomasterfrom
cz/bitflip-finetune

ChengZhang-98 commented Feb 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChengZhang-98 commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ChengZhang-98 commented Feb 28, 2026 •

edited

Loading