Commit baa58db
Metal weight repacking: tile-major Q4 layout + coalesced GPU kernel
New Metal kernel:
- matmul_tq_q4_repacked: SIMD-group coalesced reads from tile-major layout
(32 rows per tile, adjacent threads read consecutive memory)
- kv_cache_write: GPU-side KV cache update (eliminates Phase A commit)
Weight repacking infrastructure:
- tq_metal_repack_q4(): row-major → tile-major Q4 block transposition
- Lazy repack cache: first GPU dispatch triggers repack, subsequent use cached
- 128-entry cache for model weight matrices
Benchmark results (M1 Pro):
| Config | SmolLM2 135M | Llama 3.2 3B |
|----------------|-------------|-------------|
| CPU NEON Q4 | 96 tok/s | 17 tok/s | ← current best
| GPU non-repack | 22 tok/s | 0.6 tok/s |
| GPU repacked | 27 tok/s | 0.6 tok/s | ← +23% from repack
| llama.cpp GPU | 128 tok/s | 55 tok/s |
Conclusion: Q4 nibble extraction (integer bit ops) is fundamentally slow
on Apple GPU which is optimized for float/half. CPU NEON fused dot remains
optimal for Q4 batch-1 inference. GPU path disabled, infrastructure kept
for future FP16/BF16 weights (no bit extraction needed).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 59f2203 commit baa58db
File tree
3 files changed
+156
-12
lines changed- src
- backend/metal
- engine
3 files changed
+156
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
523 | 523 | | |
524 | 524 | | |
525 | 525 | | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
526 | 590 | | |
527 | 591 | | |
528 | 592 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
56 | 57 | | |
57 | 58 | | |
58 | 59 | | |
| |||
429 | 430 | | |
430 | 431 | | |
431 | 432 | | |
| 433 | + | |
432 | 434 | | |
433 | 435 | | |
434 | 436 | | |
| |||
1714 | 1716 | | |
1715 | 1717 | | |
1716 | 1718 | | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
| 1729 | + | |
1717 | 1730 | | |
1718 | 1731 | | |
1719 | 1732 | | |
| |||
1723 | 1736 | | |
1724 | 1737 | | |
1725 | 1738 | | |
| 1739 | + | |
| 1740 | + | |
| 1741 | + | |
| 1742 | + | |
| 1743 | + | |
| 1744 | + | |
| 1745 | + | |
| 1746 | + | |
| 1747 | + | |
| 1748 | + | |
| 1749 | + | |
| 1750 | + | |
| 1751 | + | |
| 1752 | + | |
| 1753 | + | |
| 1754 | + | |
| 1755 | + | |
| 1756 | + | |
| 1757 | + | |
| 1758 | + | |
| 1759 | + | |
| 1760 | + | |
| 1761 | + | |
| 1762 | + | |
| 1763 | + | |
| 1764 | + | |
| 1765 | + | |
| 1766 | + | |
| 1767 | + | |
| 1768 | + | |
| 1769 | + | |
| 1770 | + | |
| 1771 | + | |
| 1772 | + | |
| 1773 | + | |
| 1774 | + | |
| 1775 | + | |
| 1776 | + | |
| 1777 | + | |
| 1778 | + | |
| 1779 | + | |
| 1780 | + | |
| 1781 | + | |
| 1782 | + | |
| 1783 | + | |
| 1784 | + | |
| 1785 | + | |
1726 | 1786 | | |
1727 | 1787 | | |
1728 | 1788 | | |
| |||
1997 | 2057 | | |
1998 | 2058 | | |
1999 | 2059 | | |
2000 | | - | |
2001 | | - | |
2002 | | - | |
2003 | | - | |
2004 | | - | |
2005 | | - | |
2006 | | - | |
2007 | | - | |
2008 | | - | |
2009 | | - | |
2010 | | - | |
2011 | | - | |
| 2060 | + | |
| 2061 | + | |
| 2062 | + | |
| 2063 | + | |
| 2064 | + | |
| 2065 | + | |
| 2066 | + | |
| 2067 | + | |
| 2068 | + | |
| 2069 | + | |
| 2070 | + | |
| 2071 | + | |
| 2072 | + | |
| 2073 | + | |
| 2074 | + | |
| 2075 | + | |
| 2076 | + | |
| 2077 | + | |
| 2078 | + | |
| 2079 | + | |
| 2080 | + | |
| 2081 | + | |
| 2082 | + | |
| 2083 | + | |
| 2084 | + | |
2012 | 2085 | | |
2013 | 2086 | | |
2014 | 2087 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2184 | 2184 | | |
2185 | 2185 | | |
2186 | 2186 | | |
| 2187 | + | |
| 2188 | + | |
| 2189 | + | |
| 2190 | + | |
| 2191 | + | |
| 2192 | + | |
| 2193 | + | |
2187 | 2194 | | |
2188 | 2195 | | |
2189 | 2196 | | |
| |||
0 commit comments