Skip to content

The eviction-triggered D2h offload reads KV blocks straight from the#235

Open
copybara-service[bot] wants to merge 1 commit into
mainfrom
test_940532973
Open

The eviction-triggered D2h offload reads KV blocks straight from the#235
copybara-service[bot] wants to merge 1 commit into
mainfrom
test_940532973

Conversation

@copybara-service

Copy link
Copy Markdown

The eviction-triggered D2h offload reads KV blocks straight from the
device buffer; without ordering it can read a block mid-write while the
forward is still updating it in place, corrupting the offloaded copy.
Add a wait_ready parameter to D2h: when set, the torch KVCacheManager
re-resolves each involved layer/shard's live KV buffer and blocks on its
PJRT readiness event before the copy, so the store sees settled bytes.

AwaitBufferReady uses the stable PJRT C ABI (PJRT_Buffer_ReadyEvent +
PJRT_Event_Await), not xla::Future, whose C++ ABI mismatches across the
independent raiden/torch_tpu builds. The base D2h ignores wait_ready (its
registration holds carry a retired handle); only the torch override,
which retains the live device tensors, honors it.

@copybara-service copybara-service Bot force-pushed the test_940532973 branch 2 times, most recently from ab73301 to c122f9f Compare June 30, 2026 19:47
device buffer; without ordering it can read a block mid-write while the
forward is still updating it in place, corrupting the offloaded copy.
Add a wait_ready parameter to D2h: when set, the torch KVCacheManager
re-resolves each involved layer/shard's live KV buffer and blocks on its
PJRT readiness event before the copy, so the store sees settled bytes.

AwaitBufferReady uses the stable PJRT C ABI (PJRT_Buffer_ReadyEvent +
PJRT_Event_Await), not xla::Future, whose C++ ABI mismatches across the
independent raiden/torch_tpu builds. The base D2h ignores wait_ready (its
registration holds carry a retired handle); only the torch override,
which retains the live device tensors, honors it.

PiperOrigin-RevId: 940532973
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant