Zichen Wen1,2,
Jiashu Qu2,
Dongrui Liu2*,
Zhiyuan Liu1,2,
Ruixi Wu1,2,
Yicun Yang1,
Xiangqi Jin1,
Haoyun Xu1,
Xuyang Liu1,
Weijia Li3,2,
Chaochao Lu2,
Jing Shao2,
Conghui He2β,
Linfeng Zhang1β,
1EPIC Lab, Shanghai Jiao Tong University, 2Shanghai AI Laboratory,
3Sun Yat-sen University
βCorresponding authors, *Project lead
2025.09.30π€π€ DIJA now supports Dream-Coder-v0-Instruct-7B, DiffuCoder-7B-Instruct, and DiffuCoder-7B-cpGRPO!2025.07.21π€π€ Our paper is honored to be the #1 Paper of the day!2025.07.16π€π€ We release our latest work DIJA, the first investigation into the safety issues of dLLMs. Code is available!
- π₯ This is the first investigation into the safety issues of dLLMs. We identify and characterize a novel attack pathway against dLLMs, rooted in their bidirectional and parallel decoding mechanisms.
- π₯ We propose DIJA, an automated jailbreak attack pipeline that transforms vanilla jailbreak prompts into interleaved text-mask jailbreak prompts capable of eliciting harmful completions on dLLMs.
- π₯ We conduct comprehensive experiments demonstrating the effectiveness of DIJA across multiple dLLMs compared with existing attack methods, highlighting critical gaps in current alignment strategies and exposing urgent security vulnerabilities in existing dLLM architectures that require immediate addressing.
- π― DIJA achieves the highest ASR-k across all benchmarks, indicating that dLLMs are highly unlikely to refuse answering dangerous or sensitive topics under the DIJA attack.
- π― For the more secure Dream-Instruct, DIJA achieves an improvement of up to 78.5% in ASR-e on JailbreakBench over the best baseline, ReNeLLM, and a 37.7% improvement in StrongREJECT score.
- Clone this repository.
git clone https://github.com/ZichenWen1/DIJA
cd DIJA- Install models
cd hf_models && bash model_download.sh- Environment setup
conda create -n DIJA python=3.10 -y
conda activate DIJA
pip install -r requirements.txt- [Version]: You can set the version number for this run
- [Defense_method]: Choose whether to apply defense during the attack. Options: None, Self-reminder, RPO
- [Victim_model]: Select the targeted diffusion LLM. Options: llada_instruct, llada_1.5, dream_instruct, mmada_mixcot
# Interleaved mask-text prompt construction
cd run_harmbench
bash refine_prompt/run_refine.sh [Version]
# Jailbreak attack and evaluation
bash eval_harmbench.sh DIJA [Defense_method] [Victim_model] [Version] # Interleaved mask-text prompt construction
cd run_jailbreakbench
bash refine_prompt/run_refine.sh [Version]
# Jailbreak attack and evaluation
bash eval_jailbreakbench.sh DIJA [Defense_method] [Victim_model] [Version] # Interleaved mask-text prompt construction
cd run_strongreject
bash refine_prompt/run_refine.sh [Version]
# Jailbreak attack and evaluation
bash eval_strongreject.sh DIJA [Defense_method] [Victim_model] [Version]- Release Inference and Evaluation Code
- Support DiffuCoder, Dream-Coder
- Release the interleaved mask-text prompt
- Support AdvBench evaluation
This project is released under the Apache 2.0 license.
Please consider citing our paper in your publications if our works help your research.
@article{wen2025devil,
title={The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs},
author={Wen, Zichen and Qu, Jiashu and Liu, Dongrui and Liu, Zhiyuan and Wu, Ruixi and Yang, Yicun and Jin, Xiangqi and Xu, Haoyun and Liu, Xuyang and Li, Weijia and others},
journal={arXiv preprint arXiv:2507.11097},
year={2025}
}We would like to express our sincere gratitude to the open-source contributions from the teams behind LLaDA, LLaDA-1.5, Dream, and MMaDA.
We are deeply appreciative of the open-source efforts by the developers of HarmBench, JailbreakBench, and StrongREJECT.
For any questions about our paper or code, please email zichen.wen@outlook.com.



