Skip to content

(ICLR 2026 πŸ”₯) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"

License

Notifications You must be signed in to change notification settings

ZichenWen1/DIJA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

42 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎭 The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs

Zichen Wen1,2, Jiashu Qu2, Dongrui Liu2*, Zhiyuan Liu1,2, Ruixi Wu1,2, Yicun Yang1, Xiangqi Jin1,
Haoyun Xu1, Xuyang Liu1, Weijia Li3,2, Chaochao Lu2, Jing Shao2, Conghui He2βœ‰, Linfeng Zhang1βœ‰,

1EPIC Lab, Shanghai Jiao Tong University, 2Shanghai AI Laboratory,
3Sun Yat-sen University

βœ‰Corresponding authors, *Project lead

arXiv License zhihu GitHub issues GitHub Stars

πŸ“° News

  • 2025.09.30 πŸ€—πŸ€— DIJA now supports Dream-Coder-v0-Instruct-7B, DiffuCoder-7B-Instruct, and DiffuCoder-7B-cpGRPO!
  • 2025.07.21 πŸ€—πŸ€— Our paper is honored to be the #1 Paper of the day!
  • 2025.07.16 πŸ€—πŸ€— We release our latest work DIJA, the first investigation into the safety issues of dLLMs. Code is available!

πŸ‘€ Overview

  • πŸ’₯ This is the first investigation into the safety issues of dLLMs. We identify and characterize a novel attack pathway against dLLMs, rooted in their bidirectional and parallel decoding mechanisms.
  • πŸ’₯ We propose DIJA, an automated jailbreak attack pipeline that transforms vanilla jailbreak prompts into interleaved text-mask jailbreak prompts capable of eliciting harmful completions on dLLMs.
  • πŸ’₯ We conduct comprehensive experiments demonstrating the effectiveness of DIJA across multiple dLLMs compared with existing attack methods, highlighting critical gaps in current alignment strategies and exposing urgent security vulnerabilities in existing dLLM architectures that require immediate addressing.

mask

πŸ“Š Performance

  • 🎯 DIJA achieves the highest ASR-k across all benchmarks, indicating that dLLMs are highly unlikely to refuse answering dangerous or sensitive topics under the DIJA attack.
  • 🎯 For the more secure Dream-Instruct, DIJA achieves an improvement of up to 78.5% in ASR-e on JailbreakBench over the best baseline, ReNeLLM, and a 37.7% improvement in StrongREJECT score.

mask

mask

mask

πŸ›  Preparation

  1. Clone this repository.
  git clone https://github.com/ZichenWen1/DIJA
  cd DIJA
  1. Install models
  cd hf_models && bash model_download.sh
  1. Environment setup
  conda create -n DIJA python=3.10 -y
  conda activate DIJA
  pip install -r requirements.txt

πŸ§ͺ Usage and Evaluation

Parameters

  • [Version]: You can set the version number for this run
  • [Defense_method]: Choose whether to apply defense during the attack. Options: None, Self-reminder, RPO
  • [Victim_model]: Select the targeted diffusion LLM. Options: llada_instruct, llada_1.5, dream_instruct, mmada_mixcot

HarmBench evaluation

  # Interleaved mask-text prompt construction
  cd run_harmbench
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_harmbench.sh DIJA [Defense_method] [Victim_model] [Version]

JailbreakBench evaluation

  # Interleaved mask-text prompt construction
  cd run_jailbreakbench
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_jailbreakbench.sh DIJA [Defense_method] [Victim_model] [Version]

StrongREJECT evaluation

  # Interleaved mask-text prompt construction
  cd run_strongreject
  bash refine_prompt/run_refine.sh [Version]

  # Jailbreak attack and evaluation
  bash eval_strongreject.sh DIJA [Defense_method] [Victim_model] [Version]

πŸ“Œ TODO

  • Release Inference and Evaluation Code
  • Support DiffuCoder, Dream-Coder
  • Release the interleaved mask-text prompt
  • Support AdvBench evaluation

πŸ”‘ License

This project is released under the Apache 2.0 license.

πŸ“ Citation

Please consider citing our paper in your publications if our works help your research.

@article{wen2025devil,
  title={The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs},
  author={Wen, Zichen and Qu, Jiashu and Liu, Dongrui and Liu, Zhiyuan and Wu, Ruixi and Yang, Yicun and Jin, Xiangqi and Xu, Haoyun and Liu, Xuyang and Li, Weijia and others},
  journal={arXiv preprint arXiv:2507.11097},
  year={2025}
}

πŸ‘ Acknowledgments

Diffusion LLMs

We would like to express our sincere gratitude to the open-source contributions from the teams behind LLaDA, LLaDA-1.5, Dream, and MMaDA.

Jailbreak Benchmarks

We are deeply appreciative of the open-source efforts by the developers of HarmBench, JailbreakBench, and StrongREJECT.

πŸ“© Contact

For any questions about our paper or code, please email zichen.wen@outlook.com.

About

(ICLR 2026 πŸ”₯) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published