Skip to content

Conversation

@catap
Copy link
Contributor

@catap catap commented Dec 13, 2025

No description provided.

endif()
# see https://github.com/ggerganov/ggml/pull/682
add_definitions(-DGGML_MAX_NAME=128)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you undoing f7f05fb ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because as @leejet pointed in ggml-org/ggml#682 it should be increased to 128 from default 64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are pointing to a ggml PR that was included on sd.cpp more than a year before f7f05fb .

I suggest you carefully read that commit, and its associated PR. If you got a build error because of that, it's working as intended: it's preventing you from linking against a ggml library built with an incompatible value, which would appear to work, but give you a broken binary.

if (mask_pad > 0) {
mask_in = ggml_pad(ctx, mask_in, 0, mask_pad, 0, 0);
}
mask_in = ggml_cast(ctx, mask_in, GGML_TYPE_F16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what's the reason for simply removing the padding here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it was simple removed at ggml :)

See: ggml-org/llama.cpp#17910

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

But you are assuming we'll only need to build against the new version, once it gets updated. Most of the time it's inevitable, due to an incompatible API, but being able to alternate between versions is very useful to help tracking down bugs introduced by the version change. And in this case, we can. I suggest doing something like this:

diff --git a/ggml_extend.hpp b/ggml_extend.hpp
index 92dd3b8..1d32f0e 100644
--- a/ggml_extend.hpp
+++ b/ggml_extend.hpp
@@ -1268,6 +1268,9 @@ __STATIC_INLINE__ struct ggml_tensor* ggml_ext_attention_ext(struct ggml_context
         }
 
         if (mask_in != nullptr) {
+            // the need for padding got removed in ggml 4767bda
+            // ensure we can still use the old version for now
+            #ifdef GGML_KQ_MASK_PAD
             int mask_pad = 0;
             if (mask_in->ne[1] % GGML_KQ_MASK_PAD != 0) {
                 mask_pad = GGML_PAD(L_q, GGML_KQ_MASK_PAD) - mask_in->ne[1];
@@ -1275,6 +1278,7 @@ __STATIC_INLINE__ struct ggml_tensor* ggml_ext_attention_ext(struct ggml_context
             if (mask_pad > 0) {
                 mask_in = ggml_pad(ctx, mask_in, 0, mask_pad, 0, 0);
             }
+            #endif
             mask_in = ggml_cast(ctx, mask_in, GGML_TYPE_F16);
         }

@catap
Copy link
Contributor Author

catap commented Dec 13, 2025

@wbruna thanks for your feedback, I've incorporated everything.

@Green-Sky
Copy link
Contributor

Has anyone yet looked at the performance for flash attention?

@wbruna
Copy link
Contributor

wbruna commented Dec 13, 2025

For me, SDXL 1024x1024 got slightly faster: something like 2% on Vulkan, 4% on ROCm (comparing with the results from the ggml_ext_chunk PR).

@leejet leejet merged commit 614f873 into leejet:master Dec 13, 2025
9 checks passed
@catap catap deleted the ggml-sync branch December 13, 2025 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants