Change k_cache and k_raw to use ggml_view_3d to fix np > 1 launch abort#10
Open
kstjohn1 wants to merge 1 commit into
Open
Change k_cache and k_raw to use ggml_view_3d to fix np > 1 launch abort#10kstjohn1 wants to merge 1 commit into
kstjohn1 wants to merge 1 commit into
Conversation
When using np > 1 the app crashes on launch:
/Users/admin/Downloads/llama.cpp-deepseek-v4-flash/ggml/src/ggml.c:3643: GGML_ASSERT(ggml_nelements(a) == ne0*ne1*ne2) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
See: ggml-org/llama.cpp#17869
0 libggml-base.0.10.0.dylib 0x000000010509d3d0 ggml_print_backtrace + 276
1 libggml-base.0.10.0.dylib 0x00000001051080bc ggml_abort + 156
2 libggml-base.0.10.0.dylib 0x0000000105108b50 ggml_reshape_4d.cold.1 + 0
3 libggml-base.0.10.0.dylib 0x00000001050a45c4 ggml_reshape_3d + 312
4 libllama.0.0.8927.dylib 0x000000010581d440 _ZN19llm_build_deepseek4C2ERK11llama_modelRK16llm_graph_params + 1748
5 libllama.0.0.8927.dylib 0x00000001057c82d0 _ZNSt3__111make_uniqueB9nqe210106I19llm_build_deepseek4JRK11llama_modelRK16llm_graph_paramsELi0EEENS_10unique_ptrIT_NS_14default_deleteIS9_EEEEDpOT0_ + 52
6 libllama.0.0.8927.dylib 0x00000001057c69ac _ZNK11llama_model11build_graphERK16llm_graph_params + 1816
7 libllama.0.0.8927.dylib 0x000000010570448c _ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPmi + 776
8 libllama.0.0.8927.dylib 0x0000000105702dd4 _ZN13llama_context13sched_reserveEv + 616
9 libllama.0.0.8927.dylib 0x0000000105701ccc _ZN13llama_contextC2ERK11llama_model20llama_context_params + 4196
10 libllama.0.0.8927.dylib 0x000000010570ba48 llama_init_from_model + 600
11 libllama-common.0.0.8927.dylib 0x0000000105320548
_ZL29common_get_device_memory_dataPKcPK18llama_model_paramsPK20llama_context_paramsRNSt3__16vectorIP19ggml_backend_deviceNS7_9allocatorISA_EEEERjSF_SF_14ggml_log_level
+ 232
12 libllama-common.0.0.8927.dylib 0x000000010531ad88 _ZL22common_params_fit_implPKcP18llama_model_paramsP20llama_context_paramsPfP32llama_model_tensor_buft_overridePmj14ggml_log_level + 180
18 dyld 0x000000018dcc7da4 start + 6992
Abort trap: 6
This change updates deepseek4.cpp to use ggml_view_3d rather than ggml_reshape_3d in 2 places.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When using np > 1 the app crashes on launch:
_ZL29common_get_device_memory_dataPKcPK18llama_model_paramsPK20llama_context_paramsRNSt3__16vectorIP19ggml_backend_deviceNS7_9allocatorISA_EEEERjSF_SF_14ggml_log_level
+ 232
12 libllama-common.0.0.8927.dylib 0x000000010531ad88 _ZL22common_params_fit_implPKcP18llama_model_paramsP20llama_context_paramsPfP32llama_model_tensor_buft_overridePmj14ggml_log_level + 180
18 dyld 0x000000018dcc7da4 start + 6992
Abort trap: 6
Overview
This change updates deepseek4.cpp to use ggml_view_3d rather than ggml_reshape_3d in 2 places.
Additional information
What's the difference?
Why this works:
The get_k cache view returns a 4D tensor [n_embd_head_k, 1, n_kv, ns] where ns = n_stream. The view at offset 0 selects the first n_embd_head_k * 1 * n_kv elements (stream 0). Since DeepSeek V4's reservation forces n_seqs=1, only stream 0's data is relevant during reservation — exactly what the view extracts.
Implications:
Potential concern: The k_cache->nb[1] and k_cache->nb[2] strides must match the 4D view's strides. For DeepSeek V4 with n_head_kv = 1, k_cache->nb[1] = k_cache->nb[0] (stride for head dim), and k_cache->nb[2] is the stride
across n_kv. These are exactly what get_k returns. The view correctly steps through the first stream's data.
Requirements