MoE-LPR-NPU

This repository contains the NPU-supported implementation of:

MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

Quick Links

MoE-LPR-NPU

Overview

We propose MoE-LPR (Mixture-of-Experts with Language Priors Routing),
a two-stage training framework that enhances an LLM’s multilingual capability.

Stage 1 – Post-pretraining with MoE:
Upcycle a dense model into a Mixture-of-Experts (MoE) architecture, freezing the original parameters while adding new experts.
This stage focuses on improving new-language capabilities without using original-language data.
Stage 2 – Review with LPR:
Train only the router using language priors, guiding the model to retain knowledge of the original languages while maintaining new-language performance.

Evaluations show MoE-LPR achieves superior performance and catastrophic forgetting resistance across multiple multilingual benchmarks.

Stage 1: Post-pretraining with MoE

As shown below, we upcycle the dense model to an MoE structure.
The parameters of the original model are frozen, preserving previously learned knowledge, while new experts and routers are trainable.
This design allows the model to reuse old knowledge or store new knowledge adaptively.

Stage 2: Review with LPR

After post-pretraining, the router might misassign experts for previously supported languages.
To address this, we train only the router using language priors that encourage:

Original-language tokens → routed to frozen experts
Expanded-language tokens → routing unchanged

We use a cross-entropy loss guided by token languages.
See our paper for detailed mathematical formulations.

Main Results

MoE-LPR shows consistent improvements across benchmarks like ARC-Challenge, HellaSwag, and Belebele,
demonstrating superior multilingual learning and robustness against catastrophic forgetting.

Additionally, experiments on LLaMA3-8B confirm the method’s scalability:

Train MoE-LPR

Below are the instructions for training MoE-LPR using this repository.

Tool Installation

Ensure these commands exist on your NPU server:

which lspci modprobe udevadm modinfo

Expected output:

/usr/bin/lspci
/usr/sbin/modprobe
/usr/bin/udevadm
/usr/sbin/modinfo

If missing, install them (with sudo if not root):

sudo apt update && sudo apt install -y pciutils kmod udev libaio-dev

Ascend Env Installation

Ensure CANN toolkit and CANN kernels are installed.
If not, follow the Ascend installation guide according to your server type (inference or training).

Python Env Installation

Python 3.10 (recommended)

pip install -r requirements.txt

For deepspeed installation, see 昇腾开源 - Deepspeed 安装指南

Make your model upcycled

Convert a pretrained model to llama_moelpr type via MoE upcycling:

# Under MoE-LPR/
python3 upcycling.py \
  --model_path path/to/your/model \
  --output_path path/to/output/dir \
  --num_experts 4

Train

Toy Example

We provide example data under: MoE-LPR/LLaMA-Factory/data/moelpr_examples

ja.jsonl → new language corpus
zh.jsonl → original language corpus

Stage 1:

bash scripts/stage1.sh

Stage 2:

bash scripts/stage2.sh

Data

Place monolingual documents here:

MoE-LPR/LLaMA-Factory/data

Each file should be .jsonl or .json formatted as:

{"text": "your one doc"}

Then register it in:

MoE-LPR/LLaMA-Factory/data/dataset_info.json

Example:

"hu_1b": {
  "file_name": "hu_part1b_00000.jsonl",
  "file_sha1": "e70375e28eda542a90c68213640cc371898ce184",
  "language": "new",
  "columns": {
    "prompt": "text"
  }
}

language can be "old" or "new", used during Stage 2 training. Control this with the hyperparameter generate_lang_mask.

Post-pretraining

Execute Stage 1:

bash scripts/stage1.sh

Key hyperparameters:

--moe_num_experts: total number of experts
--topk: number of experts selected per token
--aux_loss_coef: weight of load balancing loss

Logs include per-expert scores and load-balancing metrics. Full details are in the paper.

Language Priors Router Training

Execute Stage 2:

bash scripts/stage2.sh

Key hyperparameters:

--lpr_loss_coef: weight of LPR loss
--max_samples: number of documents per language

Logs show LPR loss and average expert selections.

Evaluate

We use PEFT to manage parameters:

from transformers import AutoModelForCausalLM

peftpath = ""
model = AutoModelForCausalLM.from_pretrained(peftpath)

Evaluation via lm-evaluation-harness:

lm_eval --model hf \
        --model_args pretrained=$BASE_MODEL_PATH,peft=$PEFT_MODEL_PATH,dtype="float16" \
        --tasks hellaswag_tr \
        --device cuda:0 \
        --num_fewshot 10 \
        --output_path $OUTPUT_PATH \
        --batch_size $BATCH_SIZE

Bugs or Questions?

If you have questions about the paper or code, contact: Hao Zhou (zhouh@smail.nju.edu.cn) Zhijun Wang (wangzj@smail.nju.edu.cn)

Or open an issue on GitHub with detailed information.

For questions about NPU, contact: Shen Yunzhi (shenyunzhi@smail.nju.edu.cn)

Citation

Please cite MoE-LPR if you use it in your work:

@inproceedings{zhou2024MoE-LPR,
  author = {Zhou, Hao and Wang, Zhijun and Huang, Shujian and Huang, Xin and Han, Xue and Feng, Junlan and Deng, Chao and Luo, Weihua and Chen, Jiajun},
  journal = {arXiv preprint arXiv:2408.11396},
  title = {MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing},
  url = {https://arxiv.org/abs/2408.11396},
  year = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
MoE-LPR		MoE-LPR
figures		figures
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoE-LPR-NPU

Quick Links

Overview

Stage 1: Post-pretraining with MoE

Stage 2: Review with LPR

Main Results

Train MoE-LPR

Tool Installation

Ascend Env Installation

Python Env Installation

Make your model upcycled

Train

Toy Example

Data

Post-pretraining

Language Priors Router Training

Evaluate

Bugs or Questions?

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MoE-LPR-NPU

Quick Links

Overview

Stage 1: Post-pretraining with MoE

Stage 2: Review with LPR

Main Results

Train MoE-LPR

Tool Installation

Ascend Env Installation

Python Env Installation

Make your model upcycled

Train

Toy Example

Data

Post-pretraining

Language Priors Router Training

Evaluate

Bugs or Questions?

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages