This repository contains code and sample data for supervised fine-tuning of the LLaDA-8B model. LLaDA (Large Language Diffusion with mAsking) is a diffusion-based language model that offers an alternative to traditional autoregressive models.
sft_data/conversations.json: Sample conversation data for fine-tuningpreprocess_sft_data.py: Script to preprocess the conversation datafinetune_llada.py: Main script for fine-tuning LLaDAinference_example.py: Script to test the fine-tuned modelrun_fine_tuning.sh: Shell script to run the entire fine-tuning pipeline
-
Prepare Your Data:
- Place your conversation data in the
sft_data/conversations.jsonfile - The data should follow the format in the sample file
- Place your conversation data in the
-
Run the Fine-Tuning Pipeline:
chmod +x run_fine_tuning.sh ./run_fine_tuning.sh
-
Customize Fine-Tuning Parameters:
- Edit
run_fine_tuning.shto adjust parameters like model name, batch size, learning rate, etc.
- Edit
The fine-tuning process follows the guidelines from the LLaDA paper:
-
Data Preprocessing:
- Format data as prompt-response pairs
- Handle multi-turn dialogues
- Pad with EOS tokens for equal lengths
-
Forward Process:
- Apply noise only to the response part
- Keep the prompt unchanged
-
Loss Calculation:
- Calculate loss only on masked tokens in the response
- Normalize by answer length
-
Sampling Strategies:
- Semi-autoregressive sampling with low-confidence remasking
- Divide generation into blocks for better control
- PyTorch
- Transformers (version 4.38.2 or later)
- CUDA-capable GPU (recommended)
For more details on LLaDA, refer to the original paper and the official repository.