Skip to content

[UR][L0] Migrate discrete buffer through host when P2P is not accessible#22010

Open
ldorau wants to merge 1 commit into
intel:syclfrom
ldorau:URL0_Migrate_discrete_buffer_through_host_when_P2P_is_not_accessible
Open

[UR][L0] Migrate discrete buffer through host when P2P is not accessible#22010
ldorau wants to merge 1 commit into
intel:syclfrom
ldorau:URL0_Migrate_discrete_buffer_through_host_when_P2P_is_not_accessible

Conversation

@ldorau
Copy link
Copy Markdown
Contributor

@ldorau ldorau commented May 13, 2026

When a buffer on a discrete GPU needs to be accessed from a different
device and P2P access is not enabled, migrate the data through a
temporary host buffer instead of returning UR_RESULT_ERROR_UNSUPPORTED_FEATURE.

Before migrating, wait for pending operations (from the wait list) to
complete, ensuring that prior kernel writes to the buffer are visible.

Fixes: #22007
Fixes: #22008

When a buffer on a discrete GPU needs to be accessed from a different
device and P2P access is not enabled, migrate the data through a
temporary host buffer instead of returning UR_RESULT_ERROR_UNSUPPORTED_FEATURE.

Before migrating, wait for pending operations (from the wait list) to
complete, ensuring that prior kernel writes to the buffer are visible.

Fixes: intel#22007
Fixes: intel#22008

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
@ldorau ldorau requested a review from kswiecicki May 13, 2026 12:29
@ldorau
Copy link
Copy Markdown
Contributor Author

ldorau commented May 14, 2026

Please review @intel/unified-runtime-reviewers-level-zero

// Migrate buffer through the host: copy from the current device to a
// temporary host buffer, then from host to the target device.
auto bufferSize = getSize();
std::vector<char> hostBuf(bufferSize);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe it is worth to consider USM allocation in place of heap, like in line 100

Comment on lines +369 to +372
for (uint32_t i = 0; i < waitListView.num; i++) {
ZE2UR_CALL_THROWS(zeEventHostSynchronize,
(waitListView.handles[i], UINT64_MAX));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work. The operation also needs to be ordered with regards to the command list itself, so something like this will be better:

  if (numWaitEvents > 0) {
    ZE2UR_CALL(zeCommandListAppendWaitOnEvents,
               (zeCommandList.get(), numWaitEvents, pWaitEvents));
  }
  ZE2UR_CALL(zeCommandListHostSynchronize, (zeCommandList.get(), UINT64_MAX));

auto bufferSize = getSize();
std::vector<char> hostBuf(bufferSize);

UR_CALL_THROWS(synchronousZeCopy(hContext, activeAllocationDevice,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the fact that this is synchronous. Can you explore what it would take to make it async? I think we'd need to keep the allocation somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants