Skip to content

[Bug] OpenVINO backend crashes with "node index out of range" on Mistral-Small-3.2-24B (Verified backend-specific) #120

@shahkarnav115-beep

Description

@shahkarnav115-beep

Name and Version

C:\Users\karnav\openvino.genai\thirdparty\llama.cpp\build\bin\Release>.\llama-cli.exe --version
OpenVINO: using device CPU
version: 8530 (a970515)
built with MSVC 19.44.35222.0 for x64

Operating systems

Windows

GGML backends

OpenVINO

Hardware

GEFORCE RTX 3050 intel i5 processor

Models

Mistral Small 3.2 24B : Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf

Problem description & steps to reproduce

While validating models for the OpenVINO backend on the dev_backend_openvino branch, I encountered a translation crash when testing the new Mistral-Small-3.2-24B architecture.

The model successfully loads into memory on the CPU fallback, but fails during graph_compute with an ov::Exception regarding a missing node index.

To isolate the issue, I compiled and ran the exact same model using the vanilla upstream llama.cpp master branch. The upstream build executed the model flawlessly, confirming this is not a memory/hardware limitation or a corrupted GGUF file, but rather a graph translation mapping bug specific to the OpenVINO backend for this architecture.

Related to: #116

Steps to Reproduce:

llama-cli.exe -m "Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf" -ngl 0 -c 1024 -n 10 -p "Hi"

First Bad Commit

No response

Relevant log output

Logs
OpenVINO: using device CPU

Loading model... \GGML OpenVINO backend ov::Exception: Exception from src\core\src\node.cpp:593:
node index is out of range

graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
process_ubatch: failed to compute graph, compute status: -1
llama_decode: failed to decode, ret = -3                                   /GGML OpenVINO backend ov::Exception: Exception from src\core\src\node.cpp:593:
node index is out of range

graph_compute: ggml_backend_sched_graph_compute_async failed with error -1
process_ubatch: failed to compute graph, compute status: -1
llama_decode: failed to decode, ret = -3
common_speculative_is_compat: llama_decode() failed: -3


▄▄ ▄▄
██ ██
██ ██  ▀▀█▄ ███▄███▄  ▀▀█▄    ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██    ██    ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
                                    ██    ██
                                    ▀▀    ▀▀

build      : b8530-a970515bd
model      : Mistral-Small-3.2-24B-Instruct-2506-Q4_K_M.gguf
modalities : text

available commands:
  /exit or Ctrl+C     stop or exit
  /regen              regenerate the last response
  /clear              clear the chat history
  /read               add a text file


> Hi

|GGML OpenVINO backend ov::Exception: Exception from src\core\src\node.cpp:593:
node index is out of range

graph_compute: ggml_backend_sched_graph_compute_async failed with error -1 process_ubatch: failed to compute graph, compute status: -1
llama_decode: failed to decode, ret = -3
srv  update_slots: Compute error. i = 0, n_batch = 1024, ret = -3
srv    send_error: task id = 0, error: Compute error.
Error: Compute error.


[ Prompt: 0.0 t/s | Generation: 0.0 t/s ]

> /exit


Exiting...
llama_memory_breakdown_print: | memory breakdown [MiB]           | total   free     self   model   context   compute    unaccounted |
llama_memory_breakdown_print: |   - OPENVINO0 (OpenVINO Runtime) | 16068 = 8556 + (  266 =     0 +       0 +     266) +        7246 |
llama_memory_breakdown_print: |   - Host                         |                 13908 = 13662 +     160 +      86                |

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions