Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
c7ee24b
feat(mode): sdar inference supported, decoding kv cache slot mapping …
drewjin Dec 25, 2025
1741805
feat: add test suite and utility functions for flash attention kernel…
drewjin Dec 25, 2025
0d75af5
feat(kernel): update the page_table fetch logics of decoding_kernel f…
drewjin Dec 25, 2025
191e706
fix: dllm_flash_attn_decode_kernel recompilation problem fixed
drewjin Dec 27, 2025
d2507ac
fix: all attn kernels available for inference, checking functions ava…
drewjin Dec 28, 2025
c06b7ef
fix: fix kernel compilation error on Hopper devices vis disabling TMA…
drewjin Dec 28, 2025
8434932
test: add test cases for multiround decoding
drewjin Dec 28, 2025
535e296
feat(strategy): create fast-dllm-v2 strategy
drewjin Dec 29, 2025
90a518b
update .gitignore
drewjin Dec 29, 2025
714f915
feat(sequence): add new sub-block statuses and attributes to FDV2SubB…
drewjin Dec 29, 2025
39c0d7e
chore: update GitHub workflows to grant write permissions for issues …
drewjin Dec 29, 2025
65edadd
feat: add Linear layer quantization strategy framework
Dec 31, 2025
fc32954
feat: implement W8A16 Linear quantization strategy (int8 weight + bf1…
Dec 31, 2025
266ea93
perf: implement lazy cache for W8A16 Linear quantization strategy
Dec 31, 2025
64e4347
feat: implement W8A16 TileLang kernel for Linear quantization
Dec 31, 2025
1cdf260
chore: add dependabot configuration for GitHub Actions updates
drewjin Dec 31, 2025
3d6c8ee
chore: add configuration files for code formatting, linting, and cont…
drewjin Dec 31, 2025
84b819f
docs: update CONTRIBUTING.md to reflect project name change from Tile…
drewjin Dec 31, 2025
039693c
Merge branch 'main' into feat/fast-dllm-v2
drewjin Dec 31, 2025
ea47276
feat: 为 test_text_generation.py 添加 warmup 机制和性能对比功能
Dec 31, 2025
9ba300d
feat: implement load-time quantization and memory-saving for W8A16 Li…
Dec 31, 2025
ca3007c
Optimize W8A16 and W4A16 kernels: move per-channel scale from weight …
Dec 31, 2025
9d4ad6c
[CI][Docs] Update Workflow and Documents
drewjin Dec 31, 2025
833b32c
Improve W8A8/W4A8 quality by using FP16 scales instead of BF16
Jan 1, 2026
f9a9e1a
chore: update pyproject.toml to add pandas and tilelang dependencies,…
drewjin Jan 5, 2026
6055b39
Merge branch 'feat/fast-dllm-v2' into feat/enhance-strategy
drewjin Jan 5, 2026
ba2801a
feat: implement Diffulex benchmark framework with support for multipl…
drewjin Jan 5, 2026
47b5e9d
feat: add logging capabilities and configuration management to Difful…
drewjin Jan 5, 2026
5aa3bf4
chore: add make.bat into the build scripts of docs
drewjin Jan 5, 2026
50f803d
chore: add offline evaluation script and update tilelang dependency
drewjin Jan 5, 2026
2e03ca7
bugfix: fix config dataclass mutable default and field propagation in…
drewjin Jan 5, 2026
4c5d860
bugfix: _dp_child_entry missing decoding_strategy
drewjin Jan 5, 2026
15704df
feat: introduce Diffulex Profiler for performance analysis with modul…
drewjin Jan 5, 2026
7e65c0b
bugfix: try to fix profiler bug, upload and sync first
drewjin Jan 6, 2026
5b8352f
Merge pull request #18 from drewjin/feat/enhance-strategy
drewjin Jan 6, 2026
c74b14b
Remove AttnQ quantization strategy support
Jan 12, 2026
f8aa715
Merge remote-tracking branch 'fork/main' into feat/kv-cache-fp8-support
Jan 12, 2026
67686e0
Merge branch 'zhijie-group:feat/kv-cache-fp8-support' into feat/kv-ca…
luozixin2 Jan 12, 2026
0d9dd96
Merge branch 'zhijie-group:v0.0.1' into v0.0.1
luozixin2 Jan 12, 2026
44fca07
Merge remote-tracking branch 'fork/v0.0.1' into feat/kv-cache-fp8-sup…
Jan 12, 2026
b4a4ed1
fix: 修复 FP8 KV cache RunningMax 策略中的 scale 更新逻辑
Jan 13, 2026
7b15d65
chore: 移除 .cursor 目录并添加到 .gitignore
Jan 13, 2026
9015510
Merge commit '67686e0' into feat/kv-cache-fp8-support
Jan 13, 2026
426b314
feat: optimize W8A16 decode and FP8 KV varlen path
Jan 14, 2026
dde9962
feat: integrate Marlin/AllSpark INT8 W8A16 quantization strategy
Jan 16, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
322 changes: 0 additions & 322 deletions .cursor/plans/integrate_fp8_in_attention_layers.plan.md

This file was deleted.

44 changes: 44 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# https://editorconfig.org/

root = true

[*]
charset = utf-8
end_of_line = lf
indent_style = space
indent_size = 4
trim_trailing_whitespace = true
insert_final_newline = true

[*.{py,pyi}]
indent_size = 4

[*.{cpp,hpp,cxx,cc,c,h,cu,cuh}]
indent_size = 2

[{*.cmake,CMakeLists.txt}]
indent_size = 2

[*.{yaml,yml}]
indent_size = 2

[.clang-{format,tidy}]
indent_size = 2

[Makefile]
indent_style = tab

[*.sh]
indent_size = 4

[*.bat]
indent_size = 4
end_of_line = crlf

[*.md]
indent_size = 2
x-soft-wrap-text = true

[*.rst]
indent_size = 4
x-soft-wrap-text = true
10 changes: 10 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
* text eol=lf
*.bat eol=crlf

*.svg binary
*.jpg binary
*.jpeg binary
*.png binary
*.gif binary

*.h linguist-language=C++
12 changes: 12 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
day: "monday"
time: "12:00"
timezone: "Asia/Shanghai"
commit-message:
prefix: "[CI]"

Loading