Skip to content

Conversation

@drewjin
Copy link
Contributor

@drewjin drewjin commented Dec 29, 2025

img_v3_02te_da402ad2-9193-4dce-aec0-0cb44112576g

Motivation

Description

This Pull Request introduces support for the Fast-dLLM-v2 (fdv2) native inference strategy. Building upon the existing implementation of Block Diffusion (bd), we are now integrating the fdv2 strategy featuring a specialized Intra-block Dual-Cache mechanism.

The implementation involves a hierarchical state management system, which increases the complexity of the orchestration logic (refer to the attached diagram):

  • Sub-block States: ACTIVE, TO_CACHE, and IN_CACHE.
  • Parent-block States: CACHING and DECODING.

Key Scheduling Logic:
The state management significantly alters the scheduling behavior compared to traditional methods:

  • In the CACHING state, the scheduler allocates tokens based on the full Parent-block size.
  • In the DECODING state, the scheduler switches to allocating tokens at the Sub-block granularity.

TODO List

  • Implement fdv2 strategy engine.
  • Implement fdv2 attention metadata.
  • Adapt fdv2 attention kernels for the new strategy.

drewjin and others added 3 commits December 29, 2025 13:34
@github-actions
Copy link

👋 Hi! Thank you for contributing to the Diffulex project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link

coderabbitai bot commented Dec 31, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

… modify uvicorn index URL, and improve error handling in attention module; remove unused profiling function from example scripts
@drewjin drewjin mentioned this pull request Jan 5, 2026
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant