Skip to content

Conversation

@Phylliida
Copy link
Contributor

@Phylliida Phylliida commented Nov 4, 2025

This adds extra functions

ggml_pad_circular
ggml_pad_ext_circular

That have equivalent signatures to the non-circular versions (I considered modifying the existing ones, but didn't want to break existing code). Instead of padding with zeros, they act "on a torus" and loop x and y around.

I implemented this for CUDA, CPU, and Vulkan, as those are the primary backends people use in KoboldCpp/Stable Diffusion Cpp to generate images. For other backends, it'll fall back to non-circular.

This can be used to make seamless textures, see leejet/stable-diffusion.cpp#914 for an example and the changes needed on the image generation side. For some models (Stable Diffusion) simply calling the circular functions is sufficient, for other models (Qwen Image) you need to modify Rope embeddings slightly as well (so they cleanly loop).

I ran CI tests and added tests for these, but happy to answer any questions/modify things as needed.

(Edit notes: a previous version of this pr had also circular for conv, but we've decided that only circular pad is needed)

@Acly
Copy link
Collaborator

Acly commented Nov 4, 2025

I am wondering, is it possible to add only a variant of ggml_pad with circular padding, use that as separate operation before the convolutions, then do the convolution without padding? How much slower is that?

Adding circular padding natively to all convolutions on all/most backends is a lot of investment. I'm not sure how common it is, so it would be interesting to know the trade-off.

@Phylliida
Copy link
Contributor Author

Phylliida commented Nov 15, 2025

I am wondering, is it possible to add only a variant of ggml_pad with circular padding, use that as separate operation before the convolutions, then do the convolution without padding? How much slower is that?

Adding circular padding natively to all convolutions on all/most backends is a lot of investment. I'm not sure how common it is, so it would be interesting to know the trade-off.

Huh, yes that's a very good suggestion and seems to work well.

For Qwen Image, using Vulkan on a 3090, I get 1.28s/it using pad ahead of time, vs 1.27s/it using circular convs, which is within rounding error, very little performance penalty. I'll update the PR to only do circular padding since that's all we need.

@Phylliida Phylliida changed the title Add circular tiling support to conv2d and pad, for Vulkan, CUDA, and CPU (used for making seamless textures) Add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) Nov 15, 2025
@Phylliida
Copy link
Contributor Author

Ok it should be ready now

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Vulkan change is fine.

@am17an
Copy link
Collaborator

am17an commented Nov 30, 2025

CUDA changes look good. @ggerganov needs to approve for the ggml changes and merge

test_cases.emplace_back(new test_group_norm_mul_add(GGML_TYPE_F32, {9, 9, 1280, 1}));
test_cases.emplace_back(new test_acc());
test_cases.emplace_back(new test_pad());
test_cases.emplace_back(new test_pad(GGML_TYPE_F32, {33, 17, 2, 1}, 4, 3, true)); // circular
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add test_pad_ext() with circular == true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phylliida and others added 3 commits December 1, 2025 16:02
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
@ggerganov
Copy link
Member

Make sure to gate the support for this operator in all backends that implement it. For example, in Metal add:

diff --git a/ggml/src/ggml-metal/ggml-metal-device.m b/ggml/src/ggml-metal/ggml-metal-device.m
index 09b1b5031..5ba128e3e 100644
--- a/ggml/src/ggml-metal/ggml-metal-device.m
+++ b/ggml/src/ggml-metal/ggml-metal-device.m
@@ -898,6 +898,11 @@ bool ggml_metal_device_supports_op(ggml_metal_device_t dev, const struct ggml_te
         case GGML_OP_POOL_2D:
             return op->src[0]->type == GGML_TYPE_F32;
         case GGML_OP_PAD:
+            // TODO: add circular padding support https://github.com/ggml-org/llama.cpp/pull/16985
+            if (ggml_get_op_params_i32(op, 8) != 0) {
+                return false;
+            }
+
             return (ggml_get_op_params_i32(op, 0) == 0) && (ggml_get_op_params_i32(op, 2) == 0) &&
                    (ggml_get_op_params_i32(op, 4) == 0) && (ggml_get_op_params_i32(op, 6) == 0);
         case GGML_OP_PAD_REFLECT_1D:

Similar changes to the rest of the backends that require it.

@Phylliida
Copy link
Contributor Author

Make sure to gate the support for this operator in all backends that implement it. For example, in Metal add:

diff --git a/ggml/src/ggml-metal/ggml-metal-device.m b/ggml/src/ggml-metal/ggml-metal-device.m
index 09b1b5031..5ba128e3e 100644
--- a/ggml/src/ggml-metal/ggml-metal-device.m
+++ b/ggml/src/ggml-metal/ggml-metal-device.m
@@ -898,6 +898,11 @@ bool ggml_metal_device_supports_op(ggml_metal_device_t dev, const struct ggml_te
         case GGML_OP_POOL_2D:
             return op->src[0]->type == GGML_TYPE_F32;
         case GGML_OP_PAD:
+            // TODO: add circular padding support https://github.com/ggml-org/llama.cpp/pull/16985
+            if (ggml_get_op_params_i32(op, 8) != 0) {
+                return false;
+            }
+
             return (ggml_get_op_params_i32(op, 0) == 0) && (ggml_get_op_params_i32(op, 2) == 0) &&
                    (ggml_get_op_params_i32(op, 4) == 0) && (ggml_get_op_params_i32(op, 6) == 0);
         case GGML_OP_PAD_REFLECT_1D:

Similar changes to the rest of the backends that require it.

Ok done, I added gating for cann, metal, opencl, and sycl

@github-actions github-actions bot added SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Dec 4, 2025
@CISC CISC merged commit 09c7c50 into ggml-org:master Dec 6, 2025
75 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs OpenCL Issues specific to the OpenCL backend SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants