[experimental] add multi-turn KV cache stress benchmark traces by OCWC22 · Pull Request #1032 · SemiAnalysisAI/InferenceX

OCWC22 · 2026-04-15T04:42:58Z

Summary

Add multi-turn, long-context KV cache stress testing traces for realistic inference benchmarking.

35 synthetic multi-turn traces across 6 context bands (8K → 1M+ tokens)
Real conversation content with 60-95% prefix overlap (enables prefix caching measurement)
10 KV stress sweep configs for vLLM + SGLang on H200/B200
Coexists cleanly with kv-cache-tester (separate directory, separate configs)

Why this matters

Current benchmarks use random data — no prefix caching, no multi-turn, no KV cache reuse. This adds realistic multi-turn traces that:

Enable prefix caching measurement — each turn sends all prior turns as prefix, so cache hit rates are directly measurable
Force KV offload at scale — traces at 128K-1M context depth push KV cache past single-GPU HBM limits
Test agentic workloads — tool-call loops, multi-step reasoning, session resume after idle

Sweep configuration

users: [2, 4, 8, 16, 32, 64, 128, 256]
offload-modes: ["on", "off", "noprefix"]
duration-s: 1800

Each config produces a throughput vs p99 TTFT Pareto frontier across concurrency levels and offload modes.

Context bands

Band	Tokens	What it stresses
8K	8-16K	Baseline (no KV pressure)
32K	32-64K	Batch size competition
64K	64-96K	KV competes with model weights
131K	96-131K	Offload threshold on H100 80GB
500K	384-512K	Must offload or use H200+
1M	768K-1M+	Extreme — GB200/B200 only

Coexistence with kv-cache-tester (PR #993)

This complements kv-cache-tester's 522 real Claude Code traces:

kv-cache-tester: real workload distribution, natural performance profile
This PR: controlled KV stress patterns (offload cliff, compaction, reactivation, prefix fanout)

No files in experimental/multiturn/ are modified. Separate directory (datasets/isb1/), separate configs.

Test plan

generate_sweep_configs.py dry-run resolves all configs
Export files load correctly in benchmark_export_replay.py
No conflicts with existing configs or workflows
Token counts in traces match actual content (no inflation)

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copilot

Pull request overview

Adds an ISB1 “KV cache stress / multi-turn replay” benchmarking surface (data + configs + runners + analysis utilities) to enable realistic long-context, high-prefix-overlap replay and offload-mode sweeps, while keeping it isolated from the existing experimental multiturn/kv-cache-tester lane.

Changes:

Add committed ISB1 export bundles (including preview 500K/1M lanes) and supporting ISB1 dataset documentation.
Add ISB1 KV-stress sweep workflow/config plus result summarization + gating utilities and tests.
Add/extend runner + single-node benchmark scripts (vLLM/SGLang + TriAttention variants) and GMI helper scripts for running/collecting sweeps.

Reviewed changes

Copilot reviewed 147 out of 150 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
utils/verify_producer_sync.py	New utility to compare producer vs consumer export trees for selected ISB1 subtrees.
utils/test_verify_producer_sync.py	Tests for verify_producer_sync utility (pass + content mismatch).
utils/test_summarize_isb1.py	Tests for ISB1 operator summary output formatting/sections.
utils/test_process_result.py	Adds guards/tests ensuring ISB1 replay-style results don’t go through throughput processor.
utils/test_gate_isb1.py	Tests for ISB1 gating logic and strict failure behavior.
utils/process_result.py	Adds “fail fast” guards for ISB1 replay env/payload in throughput result processor.
runners/lib_single_node_script.sh	New helper to resolve benchmark script paths (runtime-aware for ISB1 replay).
runners/launch_h200-nb.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h200-dgxc-slurm.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h200-cw.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h100-dgxc-slurm.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h100-cw.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_h100-cr.sh	Uses new script resolver; expands env passthrough for ISB1 replay/kv-stress.
runners/launch_b200-nb.sh	Uses new script resolver; executes resolved benchmark script.
runners/launch_b200-dgxc.sh	Uses new script resolver; expands env passthrough for ISB1 replay/kv-stress.
runners/launch_b200-dgxc-slurm.sh	Uses new script resolver; executes resolved benchmark script; ensures cleanup.
experimental/multiturn/vllm_benchmark/scripts/trace_replay_qwen3.5_fp8_h200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_qwen3.5_fp8_h200_sglang.sh	Adds experimental trace-replay runner script (SGLang).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_qwen3.5_fp8_b200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_qwen3.5_fp8_b200_sglang.sh	Adds experimental trace-replay runner script (SGLang).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_gptoss_fp4_h200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_gptoss_fp4_h200_sglang.sh	Adds experimental trace-replay runner script (SGLang).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_gptoss_fp4_b200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_gptoss_fp4_b200_sglang.sh	Adds experimental trace-replay runner script (SGLang).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_dsr1_fp8_h200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/scripts/trace_replay_dsr1_fp8_b200_vllm.sh	Adds experimental trace-replay runner script (vLLM).
experimental/multiturn/vllm_benchmark/launch/lmcache_vllm_h200.sh	Adds experimental LMCache-enabled vLLM launcher (H200).
experimental/multiturn/vllm_benchmark/launch/lmcache_vllm_b200.sh	Adds experimental LMCache-enabled vLLM launcher (B200).
experimental/multiturn/vllm_benchmark/launch/README.md	Docs for experimental LMCache launch helpers.
experimental/multiturn/vllm_benchmark/kv-cache-tester/traces/.gitkeep	Placeholder for external trace assets directory.
experimental/multiturn/vllm_benchmark/kv-cache-tester/README.md	Placeholder README describing expected kv-cache-tester population.
experimental/multiturn/vllm_benchmark/aiperf_traces/generate_aiperf_traces.py	Script to generate synthetic AIPerf-style sessions for replay.
experimental/multiturn/vllm_benchmark/README.md	Docs describing experimental parity surface and links to ISB1 scripts.
experimental/multiturn/vllm_benchmark/.gitignore	Ignores generated artifacts in experimental multiturn bench area.
experimental/multiturn/README.md	Replaces older notes with scoped “experimental notes” guidance and pointers to ISB1 ground truth.
experimental/README.md	Updates experimental directory warning + pointers to ISB1 ground truth docs.
datasets/isb1/scripts/plot_pareto.py	Adds Pareto frontier computation + optional plotting (TTFT p99 vs throughput).
datasets/isb1/scripts/gpu_profile_collector.sh	Adds nvidia-smi polling helper for GPU utilization/power logging.
datasets/isb1/scripts/gmi_test_matrix.sh	Adds a curated “matrix” driver for running portable benchmarks.
datasets/isb1/scripts/gmi_kv_sweep.sh	Adds concurrency × offload-mode sweep driver for portable benchmarks.
datasets/isb1/scripts/gmi_full_suite.sh	Adds full-suite portable runner across models/engines/bands (with skips).
datasets/isb1/scripts/generate_qwen35_low_band_exports.py	Generates Qwen3.5-specific low-band export bundles by rewriting filtered cells.
datasets/isb1/scripts/collect_sweep_results.py	Aggregates sweep results from DB or JSON dir; computes cliffs/benefits.
datasets/isb1/scripts/analyze_benchmark_distributions.py	Analyzes token/turn distributions for ISB1 exports or kv-cache traces.
datasets/isb1/scripts/adapt_trace_replay_result.py	Adapts kv-cache trace replay outputs into ISB1 replay JSON schema.
datasets/isb1/exports/preview/long_context_500k/manifest_qwen3.5.json	Adds preview 500k manifest (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/manifest.json	Adds preview 500k manifest (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__coding_qwen3.5_xlc2_500k_preview_v1__vllm.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__coding_qwen3.5_xlc2_500k_preview_v1__sglang.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__coding_gptoss_xlc2_500k_preview_v1__vllm.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__coding_gptoss_xlc2_500k_preview_v1__sglang.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__chat_qwen3.5_xlc2_500k_preview_v1__vllm.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__chat_qwen3.5_xlc2_500k_preview_v1__sglang.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__chat_gptoss_xlc2_500k_preview_v1__vllm.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/inferencex_trace_replay__chat_gptoss_xlc2_500k_preview_v1__sglang.json	Adds preview 500k export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_500k/README.md	Documents bounded 500k-class preview lanes and claim boundary.
datasets/isb1/exports/preview/long_context_1m/manifest.json	Adds preview 1m manifest (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/inferencex_trace_replay__coding_qwen3.5_ulc2_1m_preview_v1__vllm.json	Adds preview 1m export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/inferencex_trace_replay__coding_qwen3.5_ulc2_1m_preview_v1__sglang.json	Adds preview 1m export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/inferencex_trace_replay__chat_qwen3.5_ulc2_1m_preview_v1__vllm.json	Adds preview 1m export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/inferencex_trace_replay__chat_qwen3.5_ulc2_1m_preview_v1__sglang.json	Adds preview 1m export bundle (Git LFS pointer).
datasets/isb1/exports/preview/long_context_1m/README.md	Documents gated 1M preview lane and manual config boundary.
datasets/isb1/exports/extension_64k/vllm/code_64k1k_qwen3.5.json	Adds extension 64k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/vllm/code_64k1k.json	Adds extension 64k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/vllm/chat_64k1k_qwen3.5.json	Adds extension 64k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/vllm/chat_64k1k.json	Adds extension 64k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/sglang/code_64k1k_qwen3.5.json	Adds extension 64k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/sglang/code_64k1k.json	Adds extension 64k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/sglang/chat_64k1k_qwen3.5.json	Adds extension 64k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_64k/sglang/chat_64k1k.json	Adds extension 64k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/vllm/code_32k1k_qwen3.5.json	Adds extension 32k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/vllm/code_32k1k.json	Adds extension 32k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/vllm/chat_32k1k_qwen3.5.json	Adds extension 32k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/vllm/chat_32k1k.json	Adds extension 32k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/sglang/code_32k1k_qwen3.5.json	Adds extension 32k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/sglang/code_32k1k.json	Adds extension 32k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/sglang/chat_32k1k_qwen3.5.json	Adds extension 32k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_32k/sglang/chat_32k1k.json	Adds extension 32k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/code_131k1k_qwen3.5.json	Adds/updates extension 131k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/code_131k1k.json	Adds/updates extension 131k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/chat_131k1k_qwen3.5.json	Adds/updates extension 131k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/chat_131k1k_dsr1.json	Adds/updates extension 131k DSR1 bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/vllm/chat_131k1k.json	Adds/updates extension 131k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/code_131k1k_qwen3.5.json	Adds/updates extension 131k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/code_131k1k.json	Adds/updates extension 131k generic bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/chat_131k1k_qwen3.5.json	Adds/updates extension 131k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/chat_131k1k_dsr1.json	Adds/updates extension 131k DSR1 bundle (Git LFS pointer).
datasets/isb1/exports/extension_131k/sglang/chat_131k1k.json	Adds/updates extension 131k generic bundle (Git LFS pointer).
datasets/isb1/exports/core/vllm/code_8k1k_qwen3.5.json	Adds core 8k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/core/vllm/code_8k1k.json	Adds core 8k generic bundle (Git LFS pointer).
datasets/isb1/exports/core/vllm/chat_8k1k_qwen3.5.json	Adds core 8k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/core/vllm/chat_8k1k.json	Adds core 8k generic bundle (Git LFS pointer).
datasets/isb1/exports/core/sglang/code_8k1k_qwen3.5.json	Adds core 8k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/core/sglang/code_8k1k.json	Adds core 8k generic bundle (Git LFS pointer).
datasets/isb1/exports/core/sglang/chat_8k1k_qwen3.5.json	Adds core 8k Qwen bundle (Git LFS pointer).
datasets/isb1/exports/core/sglang/chat_8k1k.json	Adds core 8k generic bundle (Git LFS pointer).
datasets/isb1/README.md	Adds ISB1 consumer-package README with coverage inventory and claim boundary.
datasets/isb1/GMI_EXECUTION_PLAN.md	Adds execution plan/runbook for external GMI KV-stress benchmarking.
datasets/isb1/COEXISTENCE_WITH_KV_CACHE_TESTER.md	Adds coexistence plan doc for ISB1 vs kv-cache-tester surfaces.
datasets/isb1/.gitattributes	Adds attributes for exports (linguist + EOL handling).
benchmarks/single_node/qwen3.5triattn_fp8_h200_vllm.sh	Adds TriAttention vLLM benchmark script (H200).
benchmarks/single_node/qwen3.5triattn_fp8_h100_vllm.sh	Adds TriAttention vLLM benchmark script (H100).
benchmarks/single_node/qwen3.5_fp8_h200_vllm.sh	Adds/updates Qwen3.5 vLLM script (H200) with ISB1-aware prefix/offload behavior.
benchmarks/single_node/qwen3.5_fp8_h200_sglang.sh	Adds Qwen3.5 SGLang script (H200) with ISB1-aware radix/offload behavior.
benchmarks/single_node/qwen3.5_fp8_h100_vllm.sh	Adds Qwen3.5 vLLM script (H100).
benchmarks/single_node/qwen3.5_fp8_h100_sglang.sh	Adds Qwen3.5 SGLang script (H100).
benchmarks/single_node/qwen3.5_fp8_b200_vllm.sh	Adds Qwen3.5 vLLM script (B200).
benchmarks/single_node/qwen3.5_fp8_b200_sglang.sh	Adds Qwen3.5 SGLang script (B200).
benchmarks/single_node/gptosstriattn_fp4_h200_vllm.sh	Adds TriAttention vLLM benchmark script for GPT-OSS (H200).
benchmarks/single_node/gptosstriattn_fp4_h100_vllm.sh	Adds TriAttention vLLM benchmark script for GPT-OSS (H100).
benchmarks/single_node/gptoss_fp4_h200_sglang.sh	Adds GPT-OSS SGLang script (H200).
benchmarks/single_node/gptoss_fp4_h200.sh	Updates GPT-OSS H200 script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/gptoss_fp4_h100_sglang.sh	Adds GPT-OSS SGLang script (H100).
benchmarks/single_node/gptoss_fp4_h100.sh	Updates GPT-OSS H100 script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/gptoss_fp4_b200_sglang.sh	Adds GPT-OSS SGLang script (B200).
benchmarks/single_node/gptoss_fp4_b200.sh	Updates GPT-OSS B200 script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/dsr1triattn_fp8_h200_vllm.sh	Adds TriAttention vLLM benchmark script for DSR1 (H200).
benchmarks/single_node/dsr1triattn_fp8_h100_vllm.sh	Adds TriAttention vLLM benchmark script for DSR1 (H100).
benchmarks/single_node/dsr1_fp8_h200_vllm.sh	Adds DSR1 vLLM script (H200).
benchmarks/single_node/dsr1_fp8_h200.sh	Updates DSR1 H200 SGLang script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/dsr1_fp8_b200_vllm.sh	Adds DSR1 vLLM script (B200).
benchmarks/single_node/dsr1_fp8_b200.sh	Updates DSR1 B200 SGLang script to be ISB1-aware and align to run_single_node_benchmark.
benchmarks/single_node/dsr1_fp4_b200.sh	Updates DSR1 FP4 B200 SGLang script to be ISB1-aware and align to run_single_node_benchmark.
.gitignore	Adds ignores for macOS metadata + local prompt exports + .claude.
.github/workflows/run-isb1-kv-stress-sweep.yml	Adds workflow_dispatch sweep driver for ISB1 KV-stress matrix runs.
.github/workflows/collect-results.yml	Adds ISB1-specific summary + gating report generation and uploads.
.github/configs/isb1-qwen-1m-preview.yaml	Adds a manual-only gated config for 1M Qwen preview runs.
.github/configs/isb1-kv-stress.yaml	Adds dedicated KV-stress sweep config (separate from isb1-master).
.gitattributes	Tracks ISB1 export JSON under Git LFS.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…races Add ISB-1 (Inference Stress Benchmark) — a multi-turn, long-context KV cache stress testing dataset for InferenceX V3. ## What this adds **35 synthetic multi-turn traces** across 7 context bands (8K → 1M+ tokens): - 6 workload families: long_chat, coding, agent, rag, cache_stress, multimodal - KV stress patterns: prefix reuse, offload cliff, compaction, reactivation, fanout - Real conversation content with 60-95% prefix overlap (enables prefix cache testing) - Context assets from 15KB to 6.6MB inlined into traces for honest token counts **Export bundles** for vLLM + SGLang replay: - extension_131k: DeepSeek-R1, GPT-OSS, Qwen 3.5 (H200/B200) - preview/long_context_500k: Qwen 3.5 500K context stress test - preview/long_context_1m: Qwen 3.5 1M context stress test **10 KV stress sweep configs** (isb1-kv-stress-pr993.yaml): - 3 models × 2 GPUs × 2 engines - Sweep: 2→256 concurrent users × on/off/noprefix offload modes × 1800s ## Coexistence with kv-cache-tester This dataset complements PR SemiAnalysisAI#993's kv-cache-tester (522 real Claude Code traces): - kv-cache-tester: real workload distribution, natural performance profile - ISB1: controlled KV stress patterns that force offload cliffs and cache pressure No files in experimental/multiturn/ are modified. Separate config files, separate data directory (datasets/isb1/), shared replay infrastructure. ## Benchmark infrastructure - benchmark_export_replay.py: replay harness with actual_context_len telemetry - process_result_isb1.py: result aggregation with KV metrics - Prometheus metrics: kv_cache_usage, prefix_cache_hits, kv_offload_bytes - Pareto frontier: throughput vs p99 TTFT at each concurrency level ## Why this matters (from GTC 2026) > "Right now the benchmarks are kind of showing the worst the chips will > actually perform... for V3 we want to add agentic benchmarks like really > good representative multi-turn QA chat benchmarks where there are a ton > of client sessions each with multiple turns and we'll enable prefix caching." > — Cameron Quilici Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Keep only configs whose (runtime, hardware, model) triples exist in the export files — eliminates sweep generator failures - Fix canonical-model-id to match export metadata (e.g., gpt_oss_120b not gptoss) - Fix support-status to match export tiers (reviewed_preview vs unsupported) - Remove configs for engines/GPUs not yet in exports (SGLang, Dynamo, TRT, Atom, AMD) — these need export metadata updates before they can be added back - Add workload-type field required by sweep generator schema - Remove disagg/multinode fields not in KV stress schema Sweep generator now passes: exit code 0, produces valid matrix rows. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cquil11 · 2026-04-16T13:49:22Z

Some good stuff in here. Will collab async on this one and take some stuff from this PR into experimental/agentic-benchmark MVP.

OCWC22 requested review from a team and Copilot April 15, 2026 04:42

github-project-automation bot added this to InferenceMAX Board Apr 15, 2026

claude bot reviewed Apr 15, 2026

View reviewed changes

Copilot started reviewing on behalf of OCWC22 April 15, 2026 04:43 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread utils/verify_producer_sync.py

OCWC22 force-pushed the isb1/kv-cache-stress-benchmark branch 5 times, most recently from af64122 to 1b9b79c Compare April 15, 2026 08:35

cquil11 changed the title ~~feat: add multi-turn KV cache stress benchmark traces~~ [experimentak] add multi-turn KV cache stress benchmark traces Apr 15, 2026

OCWC22 force-pushed the isb1/kv-cache-stress-benchmark branch from 1b9b79c to ef90b64 Compare April 15, 2026 21:52

OCWC22 changed the title ~~[experimentak] add multi-turn KV cache stress benchmark traces~~ [experimental] add multi-turn KV cache stress benchmark traces Apr 15, 2026

OCWC22 force-pushed the isb1/kv-cache-stress-benchmark branch from ef90b64 to fbe9f79 Compare April 15, 2026 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[experimental] add multi-turn KV cache stress benchmark traces#1032

[experimental] add multi-turn KV cache stress benchmark traces#1032
OCWC22 wants to merge 2 commits intoSemiAnalysisAI:mainfrom
OCWC22:isb1/kv-cache-stress-benchmark

OCWC22 commented Apr 15, 2026

Uh oh!

claude bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

cquil11 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

OCWC22 commented Apr 15, 2026

Summary

Why this matters

Sweep configuration

Context bands

Coexistence with kv-cache-tester (PR #993)

Test plan

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

cquil11 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants