Skip to content

Change k_cache and k_raw to use ggml_view_3d to fix np > 1 launch abort#10

Open
kstjohn1 wants to merge 1 commit into
antirez:mainfrom
kstjohn1:main
Open

Change k_cache and k_raw to use ggml_view_3d to fix np > 1 launch abort#10
kstjohn1 wants to merge 1 commit into
antirez:mainfrom
kstjohn1:main

Conversation

@kstjohn1
Copy link
Copy Markdown

When using np > 1 the app crashes on launch:

/Users/admin/Downloads/llama.cpp-deepseek-v4-flash/ggml/src/ggml.c:3643: GGML_ASSERT(ggml_nelements(a) == ne0*ne1*ne2) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0   libggml-base.0.10.0.dylib           0x000000010509d3d0 ggml_print_backtrace + 276
1   libggml-base.0.10.0.dylib           0x00000001051080bc ggml_abort + 156
2   libggml-base.0.10.0.dylib           0x0000000105108b50 ggml_reshape_4d.cold.1 + 0
3   libggml-base.0.10.0.dylib           0x00000001050a45c4 ggml_reshape_3d + 312
4   libllama.0.0.8927.dylib             0x000000010581d440 _ZN19llm_build_deepseek4C2ERK11llama_modelRK16llm_graph_params + 1748
5   libllama.0.0.8927.dylib             0x00000001057c82d0 _ZNSt3__111make_uniqueB9nqe210106I19llm_build_deepseek4JRK11llama_modelRK16llm_graph_paramsELi0EEENS_10unique_ptrIT_NS_14default_deleteIS9_EEEEDpOT0_ + 52
6   libllama.0.0.8927.dylib             0x00000001057c69ac _ZNK11llama_model11build_graphERK16llm_graph_params + 1816
7   libllama.0.0.8927.dylib             0x000000010570448c _ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPmi + 776
8   libllama.0.0.8927.dylib             0x0000000105702dd4 _ZN13llama_context13sched_reserveEv + 616
9   libllama.0.0.8927.dylib             0x0000000105701ccc _ZN13llama_contextC2ERK11llama_model20llama_context_params + 4196
10  libllama.0.0.8927.dylib             0x000000010570ba48 llama_init_from_model + 600
11  libllama-common.0.0.8927.dylib      0x0000000105320548

_ZL29common_get_device_memory_dataPKcPK18llama_model_paramsPK20llama_context_paramsRNSt3__16vectorIP19ggml_backend_deviceNS7_9allocatorISA_EEEERjSF_SF_14ggml_log_level
+ 232
12 libllama-common.0.0.8927.dylib 0x000000010531ad88 _ZL22common_params_fit_implPKcP18llama_model_paramsP20llama_context_paramsPfP32llama_model_tensor_buft_overridePmj14ggml_log_level + 180
18 dyld 0x000000018dcc7da4 start + 6992
Abort trap: 6

Overview

This change updates deepseek4.cpp to use ggml_view_3d rather than ggml_reshape_3d in 2 places.

Additional information

What's the difference?

  • ggml_reshape_3d requires the source tensor to have exactly ne0 * ne1 * ne2 elements. It asserts this.
  • ggml_view_3d doesn't require element count matching. It creates a sub-view into the source tensor starting at a given offset, with the specified dimensions and strides. It simply extracts a contiguous slice.

Why this works:

The get_k cache view returns a 4D tensor [n_embd_head_k, 1, n_kv, ns] where ns = n_stream. The view at offset 0 selects the first n_embd_head_k * 1 * n_kv elements (stream 0). Since DeepSeek V4's reservation forces n_seqs=1, only stream 0's data is relevant during reservation — exactly what the view extracts.

Implications:

  • np 1 (single parallel): Cache has ns=1, the 4D tensor [n_embd_head_k, 1, n_kv, 1] has the same layout as a 3D tensor. ggml_view_3d at offset 0 sees all data — identical behavior to the current ggml_reshape_3d.
  • np 2 (double parallel): Cache has ns=2. ggml_view_3d at offset 0 extracts stream 0's n_embd_head_k * 1 * n_kv elements. The assertion passes because the view doesn't require a count match — it just slices.
  • During inference: The cache prepare() gives ns=1 per sequence (single-stream slots). Same as np 1 case — works identically.
  • Memory allocation: The view is a no-copy operation, like reshape. The compute buffer allocation is unaffected — reservation already uses n_seqs=1, so the allocated buffers match the single-stream view.

Potential concern: The k_cache->nb[1] and k_cache->nb[2] strides must match the 4D view's strides. For DeepSeek V4 with n_head_kv = 1, k_cache->nb[1] = k_cache->nb[0] (stride for head dim), and k_cache->nb[2] is the stride
across n_kv. These are exactly what get_k returns. The view correctly steps through the first stream's data.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, Deepseekv4-flash was used to identify the needed change. The change was made on my local and tested. The model works as expected and does not abort when np > 1.

When using np > 1 the app crashes on launch:

    /Users/admin/Downloads/llama.cpp-deepseek-v4-flash/ggml/src/ggml.c:3643: GGML_ASSERT(ggml_nelements(a) == ne0*ne1*ne2) failed
    WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
    See: ggml-org/llama.cpp#17869
    0   libggml-base.0.10.0.dylib           0x000000010509d3d0 ggml_print_backtrace + 276
    1   libggml-base.0.10.0.dylib           0x00000001051080bc ggml_abort + 156
    2   libggml-base.0.10.0.dylib           0x0000000105108b50 ggml_reshape_4d.cold.1 + 0
    3   libggml-base.0.10.0.dylib           0x00000001050a45c4 ggml_reshape_3d + 312
    4   libllama.0.0.8927.dylib             0x000000010581d440 _ZN19llm_build_deepseek4C2ERK11llama_modelRK16llm_graph_params + 1748
    5   libllama.0.0.8927.dylib             0x00000001057c82d0 _ZNSt3__111make_uniqueB9nqe210106I19llm_build_deepseek4JRK11llama_modelRK16llm_graph_paramsELi0EEENS_10unique_ptrIT_NS_14default_deleteIS9_EEEEDpOT0_ + 52
    6   libllama.0.0.8927.dylib             0x00000001057c69ac _ZNK11llama_model11build_graphERK16llm_graph_params + 1816
    7   libllama.0.0.8927.dylib             0x000000010570448c _ZN13llama_context13graph_reserveEjjjPK22llama_memory_context_ibPmi + 776
    8   libllama.0.0.8927.dylib             0x0000000105702dd4 _ZN13llama_context13sched_reserveEv + 616
    9   libllama.0.0.8927.dylib             0x0000000105701ccc _ZN13llama_contextC2ERK11llama_model20llama_context_params + 4196
    10  libllama.0.0.8927.dylib             0x000000010570ba48 llama_init_from_model + 600
    11  libllama-common.0.0.8927.dylib      0x0000000105320548
  _ZL29common_get_device_memory_dataPKcPK18llama_model_paramsPK20llama_context_paramsRNSt3__16vectorIP19ggml_backend_deviceNS7_9allocatorISA_EEEERjSF_SF_14ggml_log_level
     + 232
    12  libllama-common.0.0.8927.dylib      0x000000010531ad88 _ZL22common_params_fit_implPKcP18llama_model_paramsP20llama_context_paramsPfP32llama_model_tensor_buft_overridePmj14ggml_log_level + 180
    18  dyld                                0x000000018dcc7da4 start + 6992
    Abort trap: 6

This change updates deepseek4.cpp to use ggml_view_3d rather than ggml_reshape_3d in 2 places.
@github-actions github-actions Bot added the model label May 13, 2026
@kstjohn1 kstjohn1 changed the title Change k_cache and k_raw to use ggml_view_3d Change k_cache and k_raw to use ggml_view_3d to fix np > 1 launch abort May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant