-
Notifications
You must be signed in to change notification settings - Fork 16
Aling 'linalg-to-xegpu' pass with patched XeGPU dialect #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
616c9e1 to
716af02
Compare
| loc, vecLoadType, tile, vnniAxisAttr, transpose, | ||
| loc, vecLoadType, tile, packedAttr, transpose, transpose_bit, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vnniAxis->packedAttr: instead of a vnni axis (0, 1) specify "packed" attribute that's equivalent ofvnni_axis=0transpose_bit: allows to transpose data while loading. Isn't used by this lowering pass
|
|
||
| // Load A sub-tiles. | ||
| SmallVector<Value> loadVecA = | ||
| loadNdDescTiles(rewriter, loc, tilesA, readCacheHint, vnniConfA); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vnniConfA can't be used during loading since vnniAxis=1 is now longer supported. However we still need this config to compute proper tiles for xegpu.dpas later in the code.
aeada62 to
435b520
Compare
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
435b520 to
2778459
Compare
f78f6d2 to
829b9d4
Compare
| // Create output initial value load tiles. | ||
| // CHECK: %[[rootC:.+]] = xegpu.create_nd_tdesc %[[C]] | ||
| // CHECK: %[[tC:.+]] = xegpu.update_nd_offset %[[rootC]], [0, 0] | ||
| // CHECK: %[[tC:.+]] = xegpu.update_nd_offset %[[rootC]], [%c0, %c0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imex doesn't support constant offsets (see intel/mlir-extensions#815)
| // Extract DPAS-sized chunks from larger loaded tile A. | ||
| // Tile B is already in the correct shape. | ||
| // CHECK: %[[vA_flat:.+]] = vector.shape_cast %[[vA]] : vector<32x8x2xf16> to vector<512xf16> | ||
| // CHECK: %[[vA_flat:.+]] = vector.shape_cast %[[vA]] : vector<32x16xf16> to vector<512xf16> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do not load the A matrix via vnni_axis=1 anymore (see packed_attr)
|
The IMEX changes are merged in Menooker:dev. |
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
| # required functionality is merged. | ||
| gc_fetch_content(imex 496b240093b5e132b60c5ee69878300fe69be300 https://github.com/Menooker/mlir-extensions | ||
| SET IMEX_CHECK_LLVM_VERSION=ON IMEX_ENABLE_L0_RUNTIME=0 | ||
| gc_fetch_content(imex d5bbd635dee500b8cff138686833bacfac5ade78 https://github.com/Menooker/mlir-extensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to the latest commit in dev branch
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
Signed-off-by: dchigarev <dmitry.chigarev@intel.com>
| cl_platform_id platform; // OpenCL platform | ||
| cl_device_id device; // device ID | ||
| CL_SAFE_CALL(clGetPlatformIDs(1, &platform, NULL)); | ||
| CL_SAFE_CALL(clGetDeviceIDs(platform, *devtype, 1, &device, NULL)); | ||
| return device; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old logic searched for a device of the requested type only in one platform (and couldn't find any GPU on my machine). Rewritten the logic to iterate over all available platforms and return a first suitable device
Closes #192
This PR updates
linalg-to-xegpupass to make it compatible withxegpu-to-vc-funcpass from IMEX.The PR also adds a simple e2e test for
linalg->xegpu->gpu exepipeline.