Expose cached-prefix identity for llm-d routing

Parent tracker: #69
Design context: #65

## Summary

Make cached-prefix replay compatible with scaled llm-d deployments, where `agentic-api` database state and vLLM KV cache residency are separate layers.

## Scope

- Make the replay prefix router-visible without sending the full token array.
- Define compact prefix identity: prefix hash, token count, block size, and eventually a block-hash chain compatible with llm-d precise-prefix routing.
- Ensure vLLM KV events are emitted for the Responses replay path.
- Align the replay-plan token stream, block size, and prefix/block hash with llm-d routing.
- Test active-active EPP, tiered KV offload, wrong-pod routing, pod restart, `AllBlocksCleared`, and shared-storage reload scenarios.

## Acceptance criteria

- Router-visible prefix hints are derived from the same token stream and block size vLLM uses for KV events.
- Wrong-pod, restart, and cache-clear cases fall back safely.
- The design does not treat a process-local vLLM handle as durable global database state.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose cached-prefix identity for llm-d routing #73

Summary

Scope

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Expose cached-prefix identity for llm-d routing #73

Description

Summary

Scope

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions