GGML_OPENVINO_STATEFUL_EXECUTION=1 GGML_OPENVINO_DEVICE=GPU ./build/ReleaseOV/bin/llama-cli -c 64 -m ~/models/Llama-3.2-1B-Instruct-Q4_0.gguf --context-shift
...
...
llama.cpp/ggml/src/ggml-backend.cpp:809: pre-allocated tensor (cache_k_l0 (view) (view)) in a buffer (OPENVINO0) that cannot run the operation (ROPE)
[New LWP 75786]
[New LWP 75760]
[New LWP 75759]
[New LWP 75757]
[New LWP 75756]
[New LWP 75755]
[New LWP 75754]
[New LWP 75753]
[New LWP 75752]
[New LWP 75751]
[New LWP 75745]
[New LWP 75744]
[New LWP 75743]
[New LWP 75742]
[New LWP 75741]
[New LWP 75740]
[New LWP 75739]
[New LWP 75708]
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007a0ed6098d71 in __futex_abstimed_wait_common64 (private=721128320, cancel=true, abstime=0x7ffc2afb8bc0, op=137, expected=0, futex_word=0x6376402ba488) at ./nptl/futex-internal.c:57
warning: 57 ./nptl/futex-internal.c: No such file or directory
#0 0x00007a0ed6098d71 in __futex_abstimed_wait_common64 (private=721128320, cancel=true, abstime=0x7ffc2afb8bc0, op=137, expected=0, futex_word=0x6376402ba488) at ./nptl/futex-internal.c:57
57 in ./nptl/futex-internal.c
#1 __futex_abstimed_wait_common (cancel=true, private=721128320, abstime=0x7ffc2afb8bc0, clockid=32764, expected=0, futex_word=0x6376402ba488) at ./nptl/futex-internal.c:87
87 in ./nptl/futex-internal.c
#2 __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x6376402ba488, expected=expected@entry=0, clockid=clockid@entry=1, abstime=abstime@entry=0x7ffc2afb8bc0, private=private@entry=0) at ./nptl/futex-internal.c:139
139 in ./nptl/futex-internal.c
#3 0x00007a0ed609c116 in __pthread_cond_wait_common (abstime=<optimized out>, clockid=<optimized out>, mutex=0x6376402ba438, cond=0x6376402ba460) at ./nptl/pthread_cond_wait.c:503
warning: 503 ./nptl/pthread_cond_wait.c: No such file or directory
#4 ___pthread_cond_clockwait64 (abstime=<optimized out>, clockid=<optimized out>, mutex=0x6376402ba438, cond=0x6376402ba460) at ./nptl/pthread_cond_wait.c:691
691 in ./nptl/pthread_cond_wait.c
#5 ___pthread_cond_clockwait64 (cond=0x6376402ba460, mutex=0x6376402ba438, clockid=<optimized out>, abstime=<optimized out>) at ./nptl/pthread_cond_wait.c:679
679 in ./nptl/pthread_cond_wait.c
#6 0x000063762573de9e in server_response::recv_with_timeout(std::unordered_set<int, std::hash<int>, std::equal_to<int>, std::allocator<int> > const&, int) ()
#7 0x0000637625743da2 in server_response_reader::next(std::function<bool ()> const&) ()
#8 0x00006376256ecc85 in cli_context::generate_completion[abi:cxx11](result_timings&) ()
#9 0x00006376256d1266 in main ()
[Inferior 1 (process 75692) detached]
Aborted (core dumped)
Name and Version
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-cli
Command line
Problem description & steps to reproduce
When using
llama-cliwith the OpenVINO backend and enabling--context-shift, the application crashes with a fatal assertion error inggml-backend.cpp:809. This occurs across CPU, GPU, and NPU devices.Key observations:
cache_k_l0 (view)— a viewed/sliced tensor from the KV cacheROPE--context-shiftis enabled, which modifies KV cache managementRelevant log output
Logs