optimize ggml_ext_chunk #1080

leejet · 2025-12-12T17:17:38Z

No description provided.

leejet · 2025-12-12T17:20:54Z

ggml_extend.hpp

    if (dim != 3) {
        x = ggml_ext_torch_permute(ctx, x, perm[0], perm[1], perm[2], perm[3]);
-        x = ggml_cont(ctx, x);
+        if (cont) {


Perhaps we can remove ggml_cont here, but I haven’t fully verified it yet, so I’ll keep ggml_cont here for now.

Wouldn't something like this work in all cases? It seems to work for unet models at least. The code is simpler and it seems (slightly) faster than using permutations.

__STATIC_INLINE__ std::vector<struct ggml_tensor*> ggml_ext_chunk(struct ggml_context* ctx, struct ggml_tensor* x, int num, int64_t dim, bool cont = true) { GGML_ASSERT(dim >= 0 && dim < 4); GGML_ASSERT(x->ne[dim] % num == 0); std::vector<struct ggml_tensor*> chunks; int64_t chunk_size = x->ne[dim] / num; int64_t stride = chunk_size * x->nb[dim]; int64_t chunk_ne[4] = {x->ne[0], x->ne[1], x->ne[2], x->ne[3]}; chunk_ne[dim] = chunk_size; for (int i = 0; i < num; i++) { auto chunk = ggml_view_4d( ctx, x, chunk_ne[0], chunk_ne[1], chunk_ne[2], chunk_ne[3], x->nb[1], x->nb[2], x->nb[3], stride * i); if (cont) { chunk = ggml_cont(ctx, chunk); } chunks.push_back(chunk); } return chunks; }

Maybe it's not really faster, seems to be withing margin of error for run-to-run variations.

Also it doesn't cause #1080 (comment)

@stduhpf This change looks like a simpler optimization. Could you open a separate PR for it? I’ll close this PR.

wbruna · 2025-12-12T17:39:37Z

0835e5c broke sd1.5:

master-408	`0835e5c`

stduhpf · 2025-12-12T17:46:36Z

@wbruna, Oh, you're right, I was only looking at the speed.

daniandtheweb · 2025-12-12T18:41:58Z

0835e5c broke sd1.5:

Same on SDXL.

wbruna · 2025-12-12T20:35:12Z

Testing each version on SD1.5: when compared with 59ebdf0, #1079 seems almost as fast on Vulkan, and around 9% slower on ROCm. The ggml_ext_chunk suggested above is ~3-4% slower on both:

version	vulkan	rocm
`59ebdf0`	2.65s/it	2.34s/it
`347710f` (and current master)	3.65s/it	3.44s/it
ggml_ext_chunk above	2.75s/it	2.41s/it
#1079	2.69s/it	2.54s/it

leejet · 2025-12-13T05:37:31Z

0835e5c broke sd1.5:

master-408 0835e5c

It looks like the implementations of the CUDA backend and the Vulkan backend are a bit different. I was able to reproduce it with the Vulkan backend as well, but everything works fine with the CUDA backend.

optimize ggml_ext_chunk

0835e5c

leejet mentioned this pull request Dec 12, 2025

Fix recent GEGLU changes slowing down inference unet models #1079

Closed

leejet commented Dec 12, 2025

View reviewed changes

stduhpf mentioned this pull request Dec 13, 2025

optimize and simplify ggml_ext_chunk #1084

Merged

leejet closed this Dec 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize ggml_ext_chunk #1080

optimize ggml_ext_chunk #1080

Uh oh!

leejet commented Dec 12, 2025

Uh oh!

leejet Dec 12, 2025

Uh oh!

stduhpf Dec 12, 2025 •

edited

Loading

Uh oh!

stduhpf Dec 12, 2025

Uh oh!

stduhpf Dec 12, 2025

Uh oh!

leejet Dec 13, 2025

Uh oh!

wbruna commented Dec 12, 2025

Uh oh!

stduhpf commented Dec 12, 2025

Uh oh!

daniandtheweb commented Dec 12, 2025

Uh oh!

wbruna commented Dec 12, 2025

Uh oh!

leejet commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

optimize ggml_ext_chunk #1080

optimize ggml_ext_chunk #1080

Uh oh!

Conversation

leejet commented Dec 12, 2025

Uh oh!

leejet Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpf Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stduhpf Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

stduhpf Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

leejet Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

wbruna commented Dec 12, 2025

Uh oh!

stduhpf commented Dec 12, 2025

Uh oh!

daniandtheweb commented Dec 12, 2025

Uh oh!

wbruna commented Dec 12, 2025

Uh oh!

leejet commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

stduhpf Dec 12, 2025 •

edited

Loading