Commit ee0b3e6
feat: chat KV cache hardening — multi-session, overflow-safe, observable
Follow-up to PR #48 (chat KV cache reuse). Audited the implementation
and addressed 4 P0/P1 fragility points found in production-like use:
1. **Multi-session safety (P0)** — quant-server held a single global
KV state. Two concurrent chat clients would corrupt each other's
cache. Now there's a per-session table (MAX_SESSIONS=16) keyed by
the OpenAI-compatible "user" field in the request body. Sessions
are LRU-evicted when full. Each session has its own kv_state,
cached_tokens, last_used. Default session ("default") preserves
the original single-client behavior.
2. **Heap-allocate prompt buffer (P0)** — tq_generate_continue used
`int new_tokens[4096]` on the stack, which silently truncated
prompts longer than 4096 tokens. Replaced with malloc up to
model->config.max_seq_len. realloc failure paths now free the
heap buffer before returning -1.
3. **Sliding window on overflow (P1)** — when n_new + max_tokens
would exceed max_seq_len, drop the oldest prompt tokens, keep
the most recent (max_seq_len - max_tokens - 32) tokens, and
force a full reprefill since the prefix shifted. Prevents
silent failure / generation truncation.
4. **Cache hit metrics (P1)** — TQ_CHAT_DEBUG=1 env var prints
per-call metrics: prefix_hit (LCP length), prefill (new tokens
processed), generated, cached. Useful for diagnosing chat
clients with poor cache reuse.
Verified end-to-end with 2 concurrent sessions:
alice cold: 334 ms
bob cold: 78 ms (separate session, no cache pollution)
alice 2nd: 78 ms (alice's cache survived bob's calls)
bob 2nd: 76 ms
... (all subsequent calls ~75-82 ms across both sessions)
Known limitation: assistant response tokens generated by sample_topp
do not always match the BPE re-tokenization of the same response
text in subsequent prompts. This caps the per-turn LCP at the prompt
boundary. Real fix is server-side text-prefix matching (cache the
last prompt text and tokenize only the suffix), tracked for the
next round.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent ee048f7 commit ee0b3e6
3 files changed
+178
-43
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15674 | 15674 | | |
15675 | 15675 | | |
15676 | 15676 | | |
15677 | | - | |
15678 | | - | |
| 15677 | + | |
| 15678 | + | |
| 15679 | + | |
| 15680 | + | |
| 15681 | + | |
| 15682 | + | |
| 15683 | + | |
15679 | 15684 | | |
15680 | 15685 | | |
15681 | 15686 | | |
15682 | | - | |
| 15687 | + | |
15683 | 15688 | | |
15684 | 15689 | | |
15685 | 15690 | | |
15686 | 15691 | | |
15687 | 15692 | | |
15688 | 15693 | | |
| 15694 | + | |
| 15695 | + | |
| 15696 | + | |
| 15697 | + | |
| 15698 | + | |
| 15699 | + | |
| 15700 | + | |
| 15701 | + | |
| 15702 | + | |
| 15703 | + | |
| 15704 | + | |
| 15705 | + | |
| 15706 | + | |
15689 | 15707 | | |
15690 | 15708 | | |
15691 | 15709 | | |
| |||
15694 | 15712 | | |
15695 | 15713 | | |
15696 | 15714 | | |
15697 | | - | |
15698 | | - | |
15699 | | - | |
15700 | | - | |
15701 | | - | |
15702 | | - | |
15703 | | - | |
15704 | | - | |
15705 | | - | |
| 15715 | + | |
15706 | 15716 | | |
15707 | 15717 | | |
15708 | 15718 | | |
15709 | 15719 | | |
| 15720 | + | |
| 15721 | + | |
15710 | 15722 | | |
15711 | 15723 | | |
15712 | 15724 | | |
15713 | 15725 | | |
15714 | 15726 | | |
15715 | 15727 | | |
15716 | 15728 | | |
15717 | | - | |
| 15729 | + | |
15718 | 15730 | | |
15719 | 15731 | | |
15720 | 15732 | | |
| |||
15825 | 15837 | | |
15826 | 15838 | | |
15827 | 15839 | | |
| 15840 | + | |
| 15841 | + | |
| 15842 | + | |
| 15843 | + | |
| 15844 | + | |
| 15845 | + | |
| 15846 | + | |
| 15847 | + | |
15828 | 15848 | | |
15829 | 15849 | | |
15830 | 15850 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
630 | 630 | | |
631 | 631 | | |
632 | 632 | | |
633 | | - | |
634 | | - | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
635 | 641 | | |
636 | 642 | | |
637 | 643 | | |
638 | | - | |
| 644 | + | |
639 | 645 | | |
640 | 646 | | |
641 | 647 | | |
642 | 648 | | |
643 | 649 | | |
644 | 650 | | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
645 | 667 | | |
646 | 668 | | |
647 | 669 | | |
| |||
652 | 674 | | |
653 | 675 | | |
654 | 676 | | |
| 677 | + | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
655 | 681 | | |
656 | 682 | | |
657 | 683 | | |
658 | 684 | | |
659 | 685 | | |
660 | | - | |
| 686 | + | |
661 | 687 | | |
662 | 688 | | |
663 | 689 | | |
| |||
764 | 790 | | |
765 | 791 | | |
766 | 792 | | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
767 | 803 | | |
768 | 804 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
76 | 101 | | |
77 | 102 | | |
78 | 103 | | |
79 | 104 | | |
80 | 105 | | |
81 | 106 | | |
82 | 107 | | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
| 108 | + | |
| 109 | + | |
89 | 110 | | |
90 | 111 | | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
91 | 156 | | |
92 | 157 | | |
93 | 158 | | |
| |||
226 | 291 | | |
227 | 292 | | |
228 | 293 | | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
229 | 298 | | |
230 | 299 | | |
231 | 300 | | |
| |||
374 | 443 | | |
375 | 444 | | |
376 | 445 | | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
377 | 453 | | |
378 | 454 | | |
379 | 455 | | |
| |||
673 | 749 | | |
674 | 750 | | |
675 | 751 | | |
676 | | - | |
677 | | - | |
678 | | - | |
679 | | - | |
680 | | - | |
681 | | - | |
682 | | - | |
683 | | - | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
684 | 761 | | |
685 | | - | |
686 | | - | |
687 | | - | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
688 | 765 | | |
689 | 766 | | |
690 | 767 | | |
| |||
715 | 792 | | |
716 | 793 | | |
717 | 794 | | |
718 | | - | |
719 | | - | |
720 | | - | |
721 | | - | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
722 | 798 | | |
723 | | - | |
724 | | - | |
725 | | - | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
726 | 802 | | |
727 | 803 | | |
728 | 804 | | |
| |||
1180 | 1256 | | |
1181 | 1257 | | |
1182 | 1258 | | |
1183 | | - | |
1184 | | - | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
1185 | 1264 | | |
1186 | 1265 | | |
1187 | 1266 | | |
| |||
0 commit comments