-
Notifications
You must be signed in to change notification settings - Fork 472
optimize ggml_ext_chunk #1080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize ggml_ext_chunk #1080
Conversation
| if (dim != 3) { | ||
| x = ggml_ext_torch_permute(ctx, x, perm[0], perm[1], perm[2], perm[3]); | ||
| x = ggml_cont(ctx, x); | ||
| if (cont) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can remove ggml_cont here, but I haven’t fully verified it yet, so I’ll keep ggml_cont here for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't something like this work in all cases? It seems to work for unet models at least. The code is simpler and it seems (slightly) faster than using permutations.
__STATIC_INLINE__ std::vector<struct ggml_tensor*> ggml_ext_chunk(struct ggml_context* ctx,
struct ggml_tensor* x,
int num,
int64_t dim,
bool cont = true) {
GGML_ASSERT(dim >= 0 && dim < 4);
GGML_ASSERT(x->ne[dim] % num == 0);
std::vector<struct ggml_tensor*> chunks;
int64_t chunk_size = x->ne[dim] / num;
int64_t stride = chunk_size * x->nb[dim];
int64_t chunk_ne[4] = {x->ne[0], x->ne[1], x->ne[2], x->ne[3]};
chunk_ne[dim] = chunk_size;
for (int i = 0; i < num; i++) {
auto chunk = ggml_view_4d(
ctx, x,
chunk_ne[0], chunk_ne[1], chunk_ne[2], chunk_ne[3],
x->nb[1], x->nb[2], x->nb[3], stride * i);
if (cont) {
chunk = ggml_cont(ctx, chunk);
}
chunks.push_back(chunk);
}
return chunks;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's not really faster, seems to be withing margin of error for run-to-run variations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it doesn't cause #1080 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stduhpf This change looks like a simpler optimization. Could you open a separate PR for it? I’ll close this PR.
|
@wbruna, Oh, you're right, I was only looking at the speed. |
Same on SDXL. |
|
Testing each version on SD1.5: when compared with 59ebdf0, #1079 seems almost as fast on Vulkan, and around 9% slower on ROCm. The
|




No description provided.