🎭 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

Zichen Wen^1,2, Jiashu Qu², Dongrui Liu^2, Zhiyuan Liu^1,2, Ruixi Wu^1,2, Yicun Yang¹, Xiangqi Jin¹,
Haoyun Xu¹, Xuyang Liu¹, Weijia Li^3,2, Chaochao Lu², Jing Shao², Conghui He^2✉, Linfeng Zhang^1✉,

¹EPIC Lab, Shanghai Jiao Tong University, ²Shanghai AI Laboratory,
³Sun Yat-sen University

✉Corresponding authors, Project lead

📰 News

2025.09.30 🤗🤗 DIJA now supports Dream-Coder-v0-Instruct-7B, DiffuCoder-7B-Instruct, and DiffuCoder-7B-cpGRPO!
2025.07.21 🤗🤗 Our paper is honored to be the #1 Paper of the day!
2025.07.16 🤗🤗 We release our latest work DIJA, the first investigation into the safety issues of dLLMs. Code is available!

👀 Overview

💥 This is the first investigation into the safety issues of dLLMs. We identify and characterize a novel attack pathway against dLLMs, rooted in their bidirectional and parallel decoding mechanisms.
💥 We propose DIJA, an automated jailbreak attack pipeline that transforms vanilla jailbreak prompts into interleaved text-mask jailbreak prompts capable of eliciting harmful completions on dLLMs.
💥 We conduct comprehensive experiments demonstrating the effectiveness of DIJA across multiple dLLMs compared with existing attack methods, highlighting critical gaps in current alignment strategies and exposing urgent security vulnerabilities in existing dLLM architectures that require immediate addressing.

📊 Performance

🎯 DIJA achieves the highest ASR-k across all benchmarks, indicating that dLLMs are highly unlikely to refuse answering dangerous or sensitive topics under the DIJA attack.
🎯 For the more secure Dream-Instruct, DIJA achieves an improvement of up to 78.5% in ASR-e on JailbreakBench over the best baseline, ReNeLLM, and a 37.7% improvement in StrongREJECT score.

🛠 Preparation

Clone this repository.

  git clone https://github.com/ZichenWen1/DIJA
  cd DIJA

Install models

  cd hf_models && bash model_download.sh

Environment setup

  conda create -n DIJA python=3.10 -y
  conda activate DIJA
  pip install -r requirements.txt

🧪 Usage and Evaluation

Parameters

[Version]: You can set the version number for this run
[Defense_method]: Choose whether to apply defense during the attack. Options: None, Self-reminder, RPO
[Victim_model]: Select the targeted diffusion LLM. Options: llada_instruct, llada_1.5, dream_instruct, mmada_mixcot

HarmBench evaluation

  # Interleaved mask-text prompt construction
  cd run_harmbench
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_harmbench.sh DIJA [Defense_method] [Victim_model] [Version]

JailbreakBench evaluation

  # Interleaved mask-text prompt construction
  cd run_jailbreakbench
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_jailbreakbench.sh DIJA [Defense_method] [Victim_model] [Version]

StrongREJECT evaluation

  # Interleaved mask-text prompt construction
  cd run_strongreject
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_strongreject.sh DIJA [Defense_method] [Victim_model] [Version]

📌 TODO

Release Inference and Evaluation Code
Support DiffuCoder, Dream-Coder
Release the interleaved mask-text prompt
Support AdvBench evaluation

🔑 License

This project is released under the Apache 2.0 license.

📍 Citation

Please consider citing our paper in your publications if our works help your research.

@article{wen2025devil,
  title={The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs},
  author={Wen, Zichen and Qu, Jiashu and Liu, Dongrui and Liu, Zhiyuan and Wu, Ruixi and Yang, Yicun and Jin, Xiangqi and Xu, Haoyun and Liu, Xuyang and Li, Weijia and others},
  journal={arXiv preprint arXiv:2507.11097},
  year={2025}
}

👍 Acknowledgments

Diffusion LLMs

We would like to express our sincere gratitude to the open-source contributions from the teams behind LLaDA, LLaDA-1.5, Dream, and MMaDA.

Jailbreak Benchmarks

We are deeply appreciative of the open-source efforts by the developers of HarmBench, JailbreakBench, and StrongREJECT.

📩 Contact

For any questions about our paper or code, please email zichen.wen@outlook.com.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
MMaDA		MMaDA
assets		assets
benchmarks		benchmarks
hf_models		hf_models
run_harmbench		run_harmbench
run_jailbreakbench		run_jailbreakbench
run_strongreject		run_strongreject
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎭 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

📰 News

👀 Overview

📊 Performance

🛠 Preparation

🧪 Usage and Evaluation

Parameters

HarmBench evaluation

JailbreakBench evaluation

StrongREJECT evaluation

📌 TODO

🔑 License

📍 Citation

👍 Acknowledgments

Diffusion LLMs

Jailbreak Benchmarks

📩 Contact

About

Uh oh!

Releases

Packages

Languages

License

ZichenWen1/DIJA

Folders and files

Latest commit

History

Repository files navigation

🎭 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

📰 News

👀 Overview

📊 Performance

🛠 Preparation

🧪 Usage and Evaluation

Parameters

HarmBench evaluation

JailbreakBench evaluation

StrongREJECT evaluation

📌 TODO

🔑 License

📍 Citation

👍 Acknowledgments

Diffusion LLMs

Jailbreak Benchmarks

📩 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages