Name and Version
C:\Users\karnav\openvino.genai\thirdparty\llama.cpp\build\bin\Release>.\llama-cli.exe --version
OpenVINO: using device CPU
version: 8530 (a970515)
built with MSVC 19.44.35222.0 for x64
Operating systems
Windows
GGML backends
OpenVINO
Hardware
GEFORCE RTX 3050 intel i5 processor
Models
Mistral Small 3.2 24B : Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf
Problem description & steps to reproduce
While validating models for the OpenVINO backend on the dev_backend_openvino branch, I encountered a translation crash when testing the new Mistral-Small-3.2-24B architecture.
The model successfully loads into memory on the CPU fallback, but fails during graph_compute with an ov::Exception regarding a missing node index.
To isolate the issue, I compiled and ran the exact same model using the vanilla upstream llama.cpp master branch. The upstream build executed the model flawlessly, confirming this is not a memory/hardware limitation or a corrupted GGUF file, but rather a graph translation mapping bug specific to the OpenVINO backend for this architecture.
Related to: #116
Steps to Reproduce:
llama-cli.exe -m "Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf" -ngl 0 -c 1024 -n 10 -p "Hi"
First Bad Commit
No response
Relevant log output
Logs
OpenVINO: using device CPU
Loading model... \GGML OpenVINO backend ov::Exception: Exception from src\core\src\node.cpp:593:
node index is out of range
graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
process_ubatch: failed to compute graph, compute status: -1
llama_decode: failed to decode, ret = -3 /GGML OpenVINO backend ov::Exception: Exception from src\core\src\node.cpp:593:
node index is out of range
graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
process_ubatch: failed to compute graph, compute status: -1
llama_decode: failed to decode, ret = -3
common_speculative_is_compat: llama_decode() failed: -3
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b8530-a970515bd
model : Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file
> Hi
|GGML OpenVINO backend ov::Exception: Exception from src\core\src\node.cpp:593:
node index is out of range
graph_compute: ggml_backend_sched_graph_compute_async failed with error -1 process_ubatch: failed to compute graph, compute status: -1
llama_decode: failed to decode, ret = -3
srv update_slots: Compute error. i = 0, n_batch = 1024, ret = -3
srv send_error: task id = 0, error: Compute error.
Error: Compute error.
[ Prompt: 0.0 t/s | Generation: 0.0 t/s ]
> /exit
Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
llama_memory_breakdown_print: | - OPENVINO0 (OpenVINO Runtime) | 16068 = 8556 + ( 266 = 0 + 0 + 266) + 7246 |
llama_memory_breakdown_print: | - Host | 13908 = 13662 + 160 + 86 |
Name and Version
C:\Users\karnav\openvino.genai\thirdparty\llama.cpp\build\bin\Release>.\llama-cli.exe --version
OpenVINO: using device CPU
version: 8530 (a970515)
built with MSVC 19.44.35222.0 for x64
Operating systems
Windows
GGML backends
OpenVINO
Hardware
GEFORCE RTX 3050 intel i5 processor
Models
Mistral Small 3.2 24B : Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf
Problem description & steps to reproduce
While validating models for the OpenVINO backend on the dev_backend_openvino branch, I encountered a translation crash when testing the new Mistral-Small-3.2-24B architecture.
The model successfully loads into memory on the CPU fallback, but fails during graph_compute with an ov::Exception regarding a missing node index.
To isolate the issue, I compiled and ran the exact same model using the vanilla upstream llama.cpp master branch. The upstream build executed the model flawlessly, confirming this is not a memory/hardware limitation or a corrupted GGUF file, but rather a graph translation mapping bug specific to the OpenVINO backend for this architecture.
Related to: #116
Steps to Reproduce:
llama-cli.exe -m "Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf" -ngl 0 -c 1024 -n 10 -p "Hi"First Bad Commit
No response
Relevant log output
Logs