Commit 49c6605
feat(wasm): chat KV cache reuse — turn N+1 is near-instant in browser (#51)
PR #50 added text-prefix matching to src/engine/tq_generate.c (used by
the HTTP server). This PR ports it to quant.h (single-header) so the
WASM browser demo and Python wheel get the same speedup.
Three layers:
1. **quant.h**: ported tq_generate_chat_text from src/engine. Added
cached_text field to quant_ctx struct. quant_chat() now uses the
text-prefix path instead of the token-LCP path. quant_free_ctx()
frees cached_text. Pass NULL prompt to reset session (frees
cached_text too).
2. **wasm/quant_wasm.c**:
- wasm_generate_async / wasm_generate now call quant_chat() instead
of quant_generate() (which destroyed the cache via free+recreate
of g_ctx every call — biggest reason WASM was slow on multi-turn).
- Reuse the existing g_ctx across calls; only update temperature/
top_p/max_tokens fields (kv_compress is immutable post-creation).
- New wasm_reset_chat() for starting a new chat session.
3. **wasm/index.html**:
- Accumulates ChatML history client-side (chatHistory string).
Each turn appends `<|im_start|>user\n${text}<|im_end|>\n
<|im_start|>assistant\n` and sends the FULL history to WASM.
- The C side's text-prefix matching reuses everything before the
new turn — turn N's prefill is O(new user message), not
O(full history).
- After response, appends model output + <|im_end|>\n so the next
turn matches the cached_text byte-for-byte.
- Loading message differentiates first turn ("Processing prompt
— may take a few seconds") vs subsequent ("Generating...").
4. **wasm/build.sh**: exports _wasm_reset_chat.
Validated end-to-end with the C test (real response replay):
turn 1: 206 ms (cold, SLOW path)
turn 2: 315 ms (FAST text_match=64)
turn 5: 437 ms (FAST text_match=321)
turn 10: 637 ms (FAST text_match=750)
Every turn after the first hits the FAST text-prefix path. The
remaining ~50ms/turn growth is the unavoidable O(n) attention cost.
For the WASM browser demo, this means: instead of every turn taking
full prefill time (5-10s for a 0.8B model), only turn 1 is slow.
Turns 2+ feel instantaneous to the user.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 471a5f4 commit 49c6605
6 files changed
Lines changed: 280 additions & 25 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1745 | 1745 | | |
1746 | 1746 | | |
1747 | 1747 | | |
| 1748 | + | |
| 1749 | + | |
| 1750 | + | |
| 1751 | + | |
1748 | 1752 | | |
1749 | 1753 | | |
1750 | 1754 | | |
| |||
15848 | 15852 | | |
15849 | 15853 | | |
15850 | 15854 | | |
| 15855 | + | |
| 15856 | + | |
| 15857 | + | |
| 15858 | + | |
| 15859 | + | |
| 15860 | + | |
| 15861 | + | |
| 15862 | + | |
| 15863 | + | |
| 15864 | + | |
| 15865 | + | |
| 15866 | + | |
| 15867 | + | |
| 15868 | + | |
| 15869 | + | |
| 15870 | + | |
| 15871 | + | |
| 15872 | + | |
| 15873 | + | |
| 15874 | + | |
| 15875 | + | |
| 15876 | + | |
| 15877 | + | |
| 15878 | + | |
| 15879 | + | |
| 15880 | + | |
| 15881 | + | |
| 15882 | + | |
| 15883 | + | |
| 15884 | + | |
| 15885 | + | |
| 15886 | + | |
| 15887 | + | |
| 15888 | + | |
| 15889 | + | |
| 15890 | + | |
| 15891 | + | |
| 15892 | + | |
| 15893 | + | |
| 15894 | + | |
| 15895 | + | |
| 15896 | + | |
| 15897 | + | |
| 15898 | + | |
| 15899 | + | |
| 15900 | + | |
| 15901 | + | |
| 15902 | + | |
| 15903 | + | |
| 15904 | + | |
| 15905 | + | |
| 15906 | + | |
| 15907 | + | |
| 15908 | + | |
| 15909 | + | |
| 15910 | + | |
| 15911 | + | |
| 15912 | + | |
| 15913 | + | |
| 15914 | + | |
| 15915 | + | |
| 15916 | + | |
| 15917 | + | |
| 15918 | + | |
| 15919 | + | |
| 15920 | + | |
| 15921 | + | |
| 15922 | + | |
| 15923 | + | |
| 15924 | + | |
| 15925 | + | |
| 15926 | + | |
| 15927 | + | |
| 15928 | + | |
| 15929 | + | |
| 15930 | + | |
| 15931 | + | |
| 15932 | + | |
| 15933 | + | |
| 15934 | + | |
| 15935 | + | |
| 15936 | + | |
| 15937 | + | |
| 15938 | + | |
| 15939 | + | |
| 15940 | + | |
| 15941 | + | |
| 15942 | + | |
| 15943 | + | |
| 15944 | + | |
| 15945 | + | |
| 15946 | + | |
| 15947 | + | |
| 15948 | + | |
| 15949 | + | |
| 15950 | + | |
| 15951 | + | |
| 15952 | + | |
| 15953 | + | |
| 15954 | + | |
| 15955 | + | |
| 15956 | + | |
| 15957 | + | |
| 15958 | + | |
| 15959 | + | |
| 15960 | + | |
| 15961 | + | |
| 15962 | + | |
| 15963 | + | |
| 15964 | + | |
| 15965 | + | |
| 15966 | + | |
| 15967 | + | |
| 15968 | + | |
| 15969 | + | |
| 15970 | + | |
| 15971 | + | |
| 15972 | + | |
| 15973 | + | |
| 15974 | + | |
| 15975 | + | |
| 15976 | + | |
| 15977 | + | |
| 15978 | + | |
| 15979 | + | |
| 15980 | + | |
| 15981 | + | |
| 15982 | + | |
| 15983 | + | |
| 15984 | + | |
| 15985 | + | |
| 15986 | + | |
| 15987 | + | |
| 15988 | + | |
| 15989 | + | |
| 15990 | + | |
| 15991 | + | |
| 15992 | + | |
| 15993 | + | |
| 15994 | + | |
| 15995 | + | |
| 15996 | + | |
| 15997 | + | |
| 15998 | + | |
| 15999 | + | |
| 16000 | + | |
| 16001 | + | |
| 16002 | + | |
| 16003 | + | |
| 16004 | + | |
| 16005 | + | |
| 16006 | + | |
| 16007 | + | |
| 16008 | + | |
| 16009 | + | |
| 16010 | + | |
| 16011 | + | |
| 16012 | + | |
| 16013 | + | |
| 16014 | + | |
| 16015 | + | |
| 16016 | + | |
| 16017 | + | |
| 16018 | + | |
| 16019 | + | |
| 16020 | + | |
| 16021 | + | |
| 16022 | + | |
| 16023 | + | |
| 16024 | + | |
| 16025 | + | |
| 16026 | + | |
| 16027 | + | |
| 16028 | + | |
| 16029 | + | |
| 16030 | + | |
| 16031 | + | |
| 16032 | + | |
| 16033 | + | |
| 16034 | + | |
| 16035 | + | |
| 16036 | + | |
| 16037 | + | |
| 16038 | + | |
| 16039 | + | |
| 16040 | + | |
| 16041 | + | |
| 16042 | + | |
| 16043 | + | |
| 16044 | + | |
| 16045 | + | |
| 16046 | + | |
| 16047 | + | |
| 16048 | + | |
| 16049 | + | |
| 16050 | + | |
| 16051 | + | |
| 16052 | + | |
| 16053 | + | |
| 16054 | + | |
| 16055 | + | |
| 16056 | + | |
| 16057 | + | |
| 16058 | + | |
| 16059 | + | |
| 16060 | + | |
| 16061 | + | |
| 16062 | + | |
| 16063 | + | |
| 16064 | + | |
| 16065 | + | |
| 16066 | + | |
| 16067 | + | |
| 16068 | + | |
| 16069 | + | |
| 16070 | + | |
| 16071 | + | |
| 16072 | + | |
| 16073 | + | |
15851 | 16074 | | |
15852 | 16075 | | |
15853 | 16076 | | |
| |||
16182 | 16405 | | |
16183 | 16406 | | |
16184 | 16407 | | |
| 16408 | + | |
16185 | 16409 | | |
16186 | 16410 | | |
16187 | 16411 | | |
| |||
16217 | 16441 | | |
16218 | 16442 | | |
16219 | 16443 | | |
| 16444 | + | |
16220 | 16445 | | |
16221 | 16446 | | |
16222 | 16447 | | |
| |||
16231 | 16456 | | |
16232 | 16457 | | |
16233 | 16458 | | |
16234 | | - | |
| 16459 | + | |
| 16460 | + | |
| 16461 | + | |
16235 | 16462 | | |
| 16463 | + | |
16236 | 16464 | | |
16237 | 16465 | | |
16238 | 16466 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | | - | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
389 | 389 | | |
390 | 390 | | |
391 | 391 | | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
392 | 404 | | |
393 | 405 | | |
394 | 406 | | |
| |||
405 | 417 | | |
406 | 418 | | |
407 | 419 | | |
408 | | - | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
409 | 424 | | |
410 | 425 | | |
411 | 426 | | |
412 | | - | |
| 427 | + | |
413 | 428 | | |
414 | 429 | | |
415 | 430 | | |
| |||
433 | 448 | | |
434 | 449 | | |
435 | 450 | | |
436 | | - | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
437 | 457 | | |
438 | | - | |
439 | | - | |
440 | | - | |
441 | 458 | | |
442 | 459 | | |
443 | 460 | | |
| |||
450 | 467 | | |
451 | 468 | | |
452 | 469 | | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
453 | 476 | | |
454 | 477 | | |
455 | 478 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Binary file not shown.
0 commit comments