[UR][L0] Migrate discrete buffer through host when P2P is not accessible by ldorau · Pull Request #22010 · intel/llvm

ldorau · 2026-05-13T11:18:14Z

When a buffer on a discrete GPU needs to be accessed from a different
device and P2P access is not enabled, migrate the data through a
temporary host buffer instead of returning UR_RESULT_ERROR_UNSUPPORTED_FEATURE.

Before migrating, wait for pending operations (from the wait list) to
complete, ensuring that prior kernel writes to the buffer are visible.

Fixes: #22007
Fixes: #22008

When a buffer on a discrete GPU needs to be accessed from a different device and P2P access is not enabled, migrate the data through a temporary host buffer instead of returning UR_RESULT_ERROR_UNSUPPORTED_FEATURE. Before migrating, wait for pending operations (from the wait list) to complete, ensuring that prior kernel writes to the buffer are visible. Fixes: intel#22007 Fixes: intel#22008 Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>

ldorau · 2026-05-14T07:24:55Z

Please review @intel/unified-runtime-reviewers-level-zero

mateuszpn · 2026-05-14T09:35:16Z

+    // Migrate buffer through the host: copy from the current device to a
+    // temporary host buffer, then from host to the target device.
+    auto bufferSize = getSize();
+    std::vector<char> hostBuf(bufferSize);


nit: maybe it is worth to consider USM allocation in place of heap, like in line 100

pbalcer · 2026-05-14T10:39:08Z

+    for (uint32_t i = 0; i < waitListView.num; i++) {
+      ZE2UR_CALL_THROWS(zeEventHostSynchronize,
+                        (waitListView.handles[i], UINT64_MAX));
+    }


I don't think this will work. The operation also needs to be ordered with regards to the command list itself, so something like this will be better:

if (numWaitEvents > 0) { ZE2UR_CALL(zeCommandListAppendWaitOnEvents, (zeCommandList.get(), numWaitEvents, pWaitEvents)); } ZE2UR_CALL(zeCommandListHostSynchronize, (zeCommandList.get(), UINT64_MAX));

pbalcer · 2026-05-14T10:44:20Z

+    auto bufferSize = getSize();
+    std::vector<char> hostBuf(bufferSize);
+
+    UR_CALL_THROWS(synchronousZeCopy(hContext, activeAllocationDevice,


I don't like the fact that this is synchronous. Can you explore what it would take to make it async? I think we'd need to keep the allocation somewhere.

ldorau requested a review from a team as a code owner May 13, 2026 11:18

ldorau mentioned this pull request May 13, 2026

enqueue-test/urEnqueueMemBuffer* tests often fail on UR L0v2 adapter #22008

Open

ldorau requested a review from kswiecicki May 13, 2026 12:29

mateuszpn reviewed May 14, 2026

View reviewed changes

pbalcer reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UR][L0] Migrate discrete buffer through host when P2P is not accessible#22010

[UR][L0] Migrate discrete buffer through host when P2P is not accessible#22010
ldorau wants to merge 1 commit into
intel:syclfrom
ldorau:URL0_Migrate_discrete_buffer_through_host_when_P2P_is_not_accessible

ldorau commented May 13, 2026

Uh oh!

ldorau commented May 14, 2026

Uh oh!

mateuszpn May 14, 2026

Uh oh!

pbalcer May 14, 2026

Uh oh!

pbalcer May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ldorau commented May 13, 2026

Uh oh!

ldorau commented May 14, 2026

Uh oh!

mateuszpn May 14, 2026

Choose a reason for hiding this comment

Uh oh!

pbalcer May 14, 2026

Choose a reason for hiding this comment

Uh oh!

pbalcer May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants