a2d

Convert open-weight autoregressive LLMs into diffusion language models. Locally, with one command.

Status: Phase 0 (walking skeleton) landed. The CLI, contracts codegen, worker protocol, and run directories exist and run end to end as a no-op pipeline. The actual conversion recipe lands in Phase 2 of the roadmap; nothing converts a real model yet. Contributions welcome - see CONTRIBUTING.md.

What this is — and what it isn't

Let's be precise about the claim, because the field deserves it:

a2d is not a new method. Converting an AR model into a diffusion model by continued training is established research — AR2Diff formulated it, DiffuGPT/DiffuLLaMA demonstrated it from 127M to 7B, and Dream 7B shipped a strong open model built exactly this way (initialized from Qwen2.5). "Recipes for any AR model" exist too (Tiny-A2D, the dLLM library). If you want the science, start with those papers — and see Prior art, which a2d builds on directly.

a2d is the first tool that makes that method universal, safe, and one command. What exists today is research scripts, a training library, and one-off model releases. What doesn't exist — and what a2d is — is a product you point at an arbitrary local checkpoint and get:

A verdict before you spend anything. a2d detect reads the model's config (no weights, no GPU) and tells you whether it can convert, how, and — when it can't — exactly why.
Automated conversion with a safety gate. Attention surgery, an identity check that proves the surgery changed nothing before training starts, then the diffusion training recipe.
Reproducible runs. Every conversion writes a manifest (source hash, config, capability set, test results) — same inputs, same run.
An honest no. Architectures the recipe doesn't fit are rejected with reasons, never silently mis-converted.

What "universal" means (scope, honestly)

Architecture-universal within the AR-transformer family — dense and Mixture-of-Experts. That covers the Llama / Qwen / GLM / Gemma / OLMoE lineage, i.e. most open-weight releases. It does not cover other paradigms: Mamba/SSM, encoder-decoder, or models that are already non-autoregressive. Those get a clean unsupported at the gate, with reasons.

New model drops that fit the family should work day-one with zero a2d code: detection is generic over the standard HF config vocabulary, and conversion runs on the HF ecosystem as soon as transformers supports the model. Support is defined by capabilities (attention variant, FFN family, weight format), not by a hardcoded model list — so gaps are visible, named, and additively fixable.

Converts	Rejected (with reasons)
Dense AR transformers (GPT-2, Pythia, Llama, Qwen…)	SSM/Mamba (`paradigm`)
MoE AR transformers (OLMoE, Qwen-MoE…)	MLA-attention models (`attn.mla`) — until a handler lands
Sliding-window / attention-sink models (Mistral, Gemma, GPT-OSS) — planned, capability-gated	ONNX-only exports (`format`)

How it works (30 seconds)

The model already knows language; that knowledge lives in weights a2d never touches. The conversion changes how it reads and what it practices:

Detect & gate — parse config.json → normalized spec → capability check.
Patch — open causal attention to bidirectional (gradually, via annealing); drop the next-token shift.
Identity gate — at anneal=0 the patched model must match the original's logits exactly; fail = abort, nothing wasted.
Train — masked-diffusion objective (MDLM; block diffusion later): fill-in-the-blank at varying mask ratios over a few billion tokens.
Output — a standard HF-layout checkpoint (+ provenance manifest and eval report) that loads with normal tooling.

# planned CLI
a2d detect  ./models/qwen2.5-1.5b            # verdict + plan, config-only
a2d convert ./models/qwen2.5-1.5b --out runs/qwen-diff --data ./corpus
a2d eval    runs/qwen-diff
a2d sample  runs/qwen-diff -p "The cat"

Weights must already be downloaded — a2d never fetches models. Third-party model code (trust_remote_code) never runs without an explicit flag.

Roadmap

Phased, no dates: walking skeleton → detection & gate (no GPU) → dense conversion (GPT-2) → eval harness → MoE (OLMoE) → block diffusion & fast sampling → hard architectures (SWA/sinks/quantized: Mistral, Gemma, GPT-OSS) → polish. Details and exit criteria: docs/SPEC-HANDOFF.md.

Prior art & credit

a2d packages other people's science. Read and cite them:

Work	What it contributed
AR2Diff — Transfer Learning for Text Diffusion	The pretrain-AR → continue-as-diffusion formulation
DiffuGPT / DiffuLLaMA	Demonstrated adaptation 127M–7B, <200B tokens; the recipe a2d's core follows
Dream 7B	AR-init diffusion at scale (from Qwen2.5); context-adaptive noise rescheduling
MDLM	The masked-diffusion objective a2d uses first
BD3LM / block diffusion	Block-parallel objective/sampling (planned)
LLaDA	From-scratch proof that diffusion LMs scale competitively
dLLM library / Tiny-A2D	Open training/eval infra and any-model recipes; a candidate dependency of a2d's worker
Full landscape	`docs/LANDSCAPE.md`

If a2d's framing ever drifts toward claiming the method — file an issue. The honest claim is the product.

Design docs

docs/ARCHITECTURE.md — the ML conversion recipe (objectives, annealing, identity test)
docs/SPEC-HANDOFF.md — the tool: lifecycle, extensibility model, contracts, roadmap
docs/LANDSCAPE.md — prior art & positioning

License

MIT. Licensed under the terms of the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
crates		crates
docs		docs
fixtures		fixtures
packages		packages
schema		schema
scripts		scripts
.editorconfig		.editorconfig
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml
rust-toolchain.toml		rust-toolchain.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

a2d

What this is — and what it isn't

What "universal" means (scope, honestly)

How it works (30 seconds)

Roadmap

Prior art & credit

Design docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

a2d

What this is — and what it isn't

What "universal" means (scope, honestly)

How it works (30 seconds)

Roadmap

Prior art & credit

Design docs

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages