Skip to content

Commit 1b9b79c

Browse files
OCWC22claude
andcommitted
feat(isb1): add KV cache stress benchmark with multi-turn synthetic traces
Add ISB-1 (Inference Stress Benchmark) — a multi-turn, long-context KV cache stress testing dataset for InferenceX V3. ## What this adds **35 synthetic multi-turn traces** across 7 context bands (8K → 1M+ tokens): - 6 workload families: long_chat, coding, agent, rag, cache_stress, multimodal - KV stress patterns: prefix reuse, offload cliff, compaction, reactivation, fanout - Real conversation content with 60-95% prefix overlap (enables prefix cache testing) - Context assets from 15KB to 6.6MB inlined into traces for honest token counts **Export bundles** for vLLM + SGLang replay: - extension_131k: DeepSeek-R1, GPT-OSS, Qwen 3.5 (H200/B200) - preview/long_context_500k: Qwen 3.5 500K context stress test - preview/long_context_1m: Qwen 3.5 1M context stress test **10 KV stress sweep configs** (isb1-kv-stress-pr993.yaml): - 3 models × 2 GPUs × 2 engines - Sweep: 2→256 concurrent users × on/off/noprefix offload modes × 1800s ## Coexistence with kv-cache-tester This dataset complements PR #993's kv-cache-tester (522 real Claude Code traces): - kv-cache-tester: real workload distribution, natural performance profile - ISB1: controlled KV stress patterns that force offload cliffs and cache pressure No files in experimental/multiturn/ are modified. Separate config files, separate data directory (datasets/isb1/), shared replay infrastructure. ## Benchmark infrastructure - benchmark_export_replay.py: replay harness with actual_context_len telemetry - process_result_isb1.py: result aggregation with KV metrics - Prometheus metrics: kv_cache_usage, prefix_cache_hits, kv_offload_bytes - Pareto frontier: throughput vs p99 TTFT at each concurrency level ## Why this matters (from GTC 2026) > "Right now the benchmarks are kind of showing the worst the chips will > actually perform... for V3 we want to add agentic benchmarks like really > good representative multi-turn QA chat benchmarks where there are a ton > of client sessions each with multiple turns and we'll enable prefix caching." > — Cameron Quilici Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 6cb8291 commit 1b9b79c

162 files changed

Lines changed: 27657 additions & 99 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
datasets/isb1/exports/preview/long_context_1m/*.json filter=lfs diff=lfs merge=lfs -text
2+
datasets/isb1/exports/**/*.json filter=lfs diff=lfs merge=lfs -text

0 commit comments

Comments
 (0)