-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) #16985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I am wondering, is it possible to add only a variant of Adding circular padding natively to all convolutions on all/most backends is a lot of investment. I'm not sure how common it is, so it would be interesting to know the trade-off. |
Huh, yes that's a very good suggestion and seems to work well. For Qwen Image, using Vulkan on a 3090, I get 1.28s/it using pad ahead of time, vs 1.27s/it using circular convs, which is within rounding error, very little performance penalty. I'll update the PR to only do circular padding since that's all we need. |
|
Ok it should be ready now |
0cc4m
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Vulkan change is fine.
|
CUDA changes look good. @ggerganov needs to approve for the ggml changes and merge |
| test_cases.emplace_back(new test_group_norm_mul_add(GGML_TYPE_F32, {9, 9, 1280, 1})); | ||
| test_cases.emplace_back(new test_acc()); | ||
| test_cases.emplace_back(new test_pad()); | ||
| test_cases.emplace_back(new test_pad(GGML_TYPE_F32, {33, 17, 2, 1}, 4, 3, true)); // circular |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add test_pad_ext() with circular == true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I added it to the loop here https://github.com/ggml-org/llama.cpp/pull/16985/files#diff-2749fdb8974ec96afa18444a9d546409318b0a862709139b677eee468c479578R7744
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
Make sure to gate the support for this operator in all backends that implement it. For example, in Metal add: diff --git a/ggml/src/ggml-metal/ggml-metal-device.m b/ggml/src/ggml-metal/ggml-metal-device.m
index 09b1b5031..5ba128e3e 100644
--- a/ggml/src/ggml-metal/ggml-metal-device.m
+++ b/ggml/src/ggml-metal/ggml-metal-device.m
@@ -898,6 +898,11 @@ bool ggml_metal_device_supports_op(ggml_metal_device_t dev, const struct ggml_te
case GGML_OP_POOL_2D:
return op->src[0]->type == GGML_TYPE_F32;
case GGML_OP_PAD:
+ // TODO: add circular padding support https://github.com/ggml-org/llama.cpp/pull/16985
+ if (ggml_get_op_params_i32(op, 8) != 0) {
+ return false;
+ }
+
return (ggml_get_op_params_i32(op, 0) == 0) && (ggml_get_op_params_i32(op, 2) == 0) &&
(ggml_get_op_params_i32(op, 4) == 0) && (ggml_get_op_params_i32(op, 6) == 0);
case GGML_OP_PAD_REFLECT_1D:Similar changes to the rest of the backends that require it. |
Ok done, I added gating for cann, metal, opencl, and sycl |
This adds extra functions
That have equivalent signatures to the non-circular versions (I considered modifying the existing ones, but didn't want to break existing code). Instead of padding with zeros, they act "on a torus" and loop x and y around.
I implemented this for CUDA, CPU, and Vulkan, as those are the primary backends people use in KoboldCpp/Stable Diffusion Cpp to generate images. For other backends, it'll fall back to non-circular.
This can be used to make seamless textures, see leejet/stable-diffusion.cpp#914 for an example and the changes needed on the image generation side. For some models (Stable Diffusion) simply calling the circular functions is sufficient, for other models (Qwen Image) you need to modify Rope embeddings slightly as well (so they cleanly loop).
I ran CI tests and added tests for these, but happy to answer any questions/modify things as needed.
(Edit notes: a previous version of this pr had also circular for conv, but we've decided that only circular pad is needed)