Add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) #16985

Phylliida · 2025-11-04T00:22:49Z

This adds extra functions

ggml_pad_circular
ggml_pad_ext_circular

That have equivalent signatures to the non-circular versions (I considered modifying the existing ones, but didn't want to break existing code). Instead of padding with zeros, they act "on a torus" and loop x and y around.

I implemented this for CUDA, CPU, and Vulkan, as those are the primary backends people use in KoboldCpp/Stable Diffusion Cpp to generate images. For other backends, it'll fall back to non-circular.

This can be used to make seamless textures, see leejet/stable-diffusion.cpp#914 for an example and the changes needed on the image generation side. For some models (Stable Diffusion) simply calling the circular functions is sufficient, for other models (Qwen Image) you need to modify Rope embeddings slightly as well (so they cleanly loop).

I ran CI tests and added tests for these, but happy to answer any questions/modify things as needed.

(Edit notes: a previous version of this pr had also circular for conv, but we've decided that only circular pad is needed)

ggml/include/ggml.h

ggml/src/ggml-vulkan/vulkan-shaders/conv2d_mm.comp

Acly · 2025-11-04T09:44:09Z

I am wondering, is it possible to add only a variant of ggml_pad with circular padding, use that as separate operation before the convolutions, then do the convolution without padding? How much slower is that?

Adding circular padding natively to all convolutions on all/most backends is a lot of investment. I'm not sure how common it is, so it would be interesting to know the trade-off.

Phylliida · 2025-11-15T00:17:14Z

I am wondering, is it possible to add only a variant of ggml_pad with circular padding, use that as separate operation before the convolutions, then do the convolution without padding? How much slower is that?

Adding circular padding natively to all convolutions on all/most backends is a lot of investment. I'm not sure how common it is, so it would be interesting to know the trade-off.

Huh, yes that's a very good suggestion and seems to work well.

For Qwen Image, using Vulkan on a 3090, I get 1.28s/it using pad ahead of time, vs 1.27s/it using circular convs, which is within rounding error, very little performance penalty. I'll update the PR to only do circular padding since that's all we need.

tests/test-backend-ops.cpp

Phylliida · 2025-11-19T02:02:30Z

Ok it should be ready now

0cc4m

The Vulkan change is fine.

tests/test-backend-ops.cpp

ggml/src/ggml-cpu/ops.cpp

ggml/src/ggml-cuda/pad.cu

am17an · 2025-11-30T15:44:15Z

CUDA changes look good. @ggerganov needs to approve for the ggml changes and merge

ggml/include/ggml.h

ggerganov · 2025-12-01T09:27:59Z

tests/test-backend-ops.cpp

    test_cases.emplace_back(new test_group_norm_mul_add(GGML_TYPE_F32, {9, 9, 1280, 1}));
    test_cases.emplace_back(new test_acc());
    test_cases.emplace_back(new test_pad());
+    test_cases.emplace_back(new test_pad(GGML_TYPE_F32, {33, 17, 2, 1}, 4, 3, true)); // circular


Should we add test_pad_ext() with circular == true?

sure, I added it to the loop here https://github.com/ggml-org/llama.cpp/pull/16985/files#diff-2749fdb8974ec96afa18444a9d546409318b0a862709139b677eee468c479578R7744

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ggerganov · 2025-12-02T08:28:55Z

Make sure to gate the support for this operator in all backends that implement it. For example, in Metal add:

diff --git a/ggml/src/ggml-metal/ggml-metal-device.m b/ggml/src/ggml-metal/ggml-metal-device.m
index 09b1b5031..5ba128e3e 100644
--- a/ggml/src/ggml-metal/ggml-metal-device.m
+++ b/ggml/src/ggml-metal/ggml-metal-device.m
@@ -898,6 +898,11 @@ bool ggml_metal_device_supports_op(ggml_metal_device_t dev, const struct ggml_te
         case GGML_OP_POOL_2D:
             return op->src[0]->type == GGML_TYPE_F32;
         case GGML_OP_PAD:
+            // TODO: add circular padding support https://github.com/ggml-org/llama.cpp/pull/16985
+            if (ggml_get_op_params_i32(op, 8) != 0) {
+                return false;
+            }
+
             return (ggml_get_op_params_i32(op, 0) == 0) && (ggml_get_op_params_i32(op, 2) == 0) &&
                    (ggml_get_op_params_i32(op, 4) == 0) && (ggml_get_op_params_i32(op, 6) == 0);
         case GGML_OP_PAD_REFLECT_1D:

Similar changes to the rest of the backends that require it.

Phylliida · 2025-12-04T21:40:16Z

Make sure to gate the support for this operator in all backends that implement it. For example, in Metal add:

diff --git a/ggml/src/ggml-metal/ggml-metal-device.m b/ggml/src/ggml-metal/ggml-metal-device.m
index 09b1b5031..5ba128e3e 100644
--- a/ggml/src/ggml-metal/ggml-metal-device.m
+++ b/ggml/src/ggml-metal/ggml-metal-device.m
@@ -898,6 +898,11 @@ bool ggml_metal_device_supports_op(ggml_metal_device_t dev, const struct ggml_te
         case GGML_OP_POOL_2D:
             return op->src[0]->type == GGML_TYPE_F32;
         case GGML_OP_PAD:
+            // TODO: add circular padding support https://github.com/ggml-org/llama.cpp/pull/16985
+            if (ggml_get_op_params_i32(op, 8) != 0) {
+                return false;
+            }
+
             return (ggml_get_op_params_i32(op, 0) == 0) && (ggml_get_op_params_i32(op, 2) == 0) &&
                    (ggml_get_op_params_i32(op, 4) == 0) && (ggml_get_op_params_i32(op, 6) == 0);
         case GGML_OP_PAD_REFLECT_1D:

Similar changes to the rest of the backends that require it.

Ok done, I added gating for cann, metal, opencl, and sycl

Phylliida added 5 commits November 3, 2025 13:27

Feat: Added vulkan circular tiling support

f6ac084

Feat: Added cpu circular

d7f5958

Feat: Added cuda kernels

1b62b49

Added tests

60bed3b

Added tests

5700a4e

Phylliida requested review from 0cc4m, ggerganov and slaren as code owners November 4, 2025 00:22

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Nov 4, 2025

DajanaV mentioned this pull request Nov 4, 2025

UPSTREAM PR #16985: Add circular tiling support to conv2d and pad, for Vulkan, CUDA, and CPU (used for making seamless textures) auroralabs-loci/llama.cpp#67

Open

This was referenced Nov 4, 2025

Seamless texture generation support for qwen image leejet/stable-diffusion.cpp#914

Open

Add circular tiling support (for making seamless textures) ggml-org/ggml#1374

Closed

jeffbolznv reviewed Nov 4, 2025

View reviewed changes

ggml/include/ggml.h Outdated Show resolved Hide resolved

ggml/src/ggml-vulkan/vulkan-shaders/conv2d_mm.comp Outdated Show resolved Hide resolved

Merge branch 'master' into master

a894631

Phylliida added 4 commits November 14, 2025 16:35

Removed non-pad operations

9861a3d

Removed unneded changes

38f8724

removed backend non pad tests

d4a664b

Merge branch 'ggml-org:master' into master

a785537

Phylliida changed the title ~~Add circular tiling support to conv2d and pad, for Vulkan, CUDA, and CPU (used for making seamless textures)~~ Add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) Nov 15, 2025

0cc4m reviewed Nov 15, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

Phylliida added 2 commits November 18, 2025 12:55

Merge branch 'ggml-org:master' into master

d9dc234

Update test-backend-ops.cpp

552e5b2

0cc4m reviewed Nov 19, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

Fixed comment on pad test

1c69e4e

Phylliida added 8 commits November 29, 2025 19:05

don't need to fix the padding

80915a1

make circular bool

1721a2b

duplicate again

89559a1

rename vulkan to wrap around

af56c82

Don't need indent

ec892ec

moved to const expr

f295d28

removed unneded extra line break

bb8ecad

More readable method calls

4d20856

am17an approved these changes Nov 30, 2025

View reviewed changes

ggml/src/ggml-cpu/ops.cpp Outdated Show resolved Hide resolved

ggml/src/ggml-cuda/pad.cu Outdated Show resolved Hide resolved

Phylliida added 2 commits November 29, 2025 21:11

Minor wording changes

b850c04

Added final newline

801cd84

ggerganov approved these changes Dec 1, 2025

View reviewed changes

Phylliida and others added 3 commits December 1, 2025 16:02

Update ggml/include/ggml.h

7fd9ea3

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Update ggml/include/ggml.h

b29544d

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

Added circular pad ext tests

2f3d4ba

Phylliida added 2 commits December 4, 2025 13:30

Gate non circular pad devices

624433d

Cleaned gating of non-circular pad devices

8515811

Phylliida requested review from lhez and max-krasnyansky as code owners December 4, 2025 21:38

github-actions bot added SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) Ascend NPU issues specific to Ascend NPUs OpenCL Issues specific to the OpenCL backend labels Dec 4, 2025

CISC merged commit 09c7c50 into ggml-org:master Dec 6, 2025
75 of 78 checks passed

CISC mentioned this pull request Dec 6, 2025

cann : fix ops broken by circular padding guard #17825

Open

gabe-l-hart mentioned this pull request Dec 10, 2025

feat: llama.cpp bump (17f7f4) for SSM performance improvements ollama/ollama#13408

Merged

Add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) #16985

Add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) #16985

Uh oh!

Conversation

Phylliida commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Acly commented Nov 4, 2025

Uh oh!

Phylliida commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Phylliida commented Nov 19, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

am17an commented Nov 30, 2025

Uh oh!

Uh oh!

Uh oh!

ggerganov Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

Phylliida Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Dec 2, 2025

Uh oh!

Phylliida commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Phylliida commented Nov 4, 2025 •

edited

Loading

Phylliida commented Nov 15, 2025 •

edited

Loading