Skip to content

Misc. bug: Tool calling CRASH : Unexpected empty grammar stack after accepting piece<tool_call> #15608

@ExtReMLapin

Description

@ExtReMLapin

Name and Version

ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 CUDA devices:
Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
Device 1: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
Device 2: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
version: 6294 (bcbddcd)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./build/bin/llama-server -ngl 999 -fa -m GLM-4.5-Air-Q5_K_M-00001-of-00002.gguf -c 131071 --jinja --reasoning-format deepseek --slots --n-predict 131071 --no-context-shift -ctk q8_0 -ctv q8_0 --parallel 1


GGufs are here : `https://huggingface.co/unsloth/GLM-4.5-Air-GGUF/tree/main/Q5_K_M`

Problem description & steps to reproduce

When using tool calling, sometime server crash (see bellow for stacktrace).

I first met this issue when working on this PR : #15248

But I found a way to 100% replicate this error on master:head ... but the model is big.

When trying with smaller examples I do not meet this issue (simple tool calling example provided in llama.cpp documentation)

Here is a .js that needs to be ran with node nodetest.js that triggers the crash every single times it's called.

nodetest.js

First Bad Commit

No response

Relevant log output

srv  params_from_: Chat format: Hermes 2 Pro
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 4244
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.482564
slot update_slots: id  0 | task 0 | kv cache rm [2048, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.965127
slot update_slots: id  0 | task 0 | kv cache rm [4096, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 4244, n_tokens = 148, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 4244, n_tokens = 148
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libggml-base.so(+0x16dab)[0x7fb495e8ddab]
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libggml-base.so(ggml_print_backtrace+0x21f)[0x7fb495e8e20f]
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libggml-base.so(+0x2975f)[0x7fb495ea075f]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7fb495cdf20c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x7fb495cdf277]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x7fb495cdf4d8]
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libllama.so(+0x65f61)[0x7fb495f89f61]
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libllama.so(_Z25llama_grammar_accept_implR13llama_grammari+0x26f)[0x7fb495fcd81f]
./build/bin/llama-server(+0x1e08ac)[0x55bff96ba8ac]
./build/bin/llama-server(+0xdfb13)[0x55bff95b9b13]
./build/bin/llama-server(+0x83c8d)[0x55bff955dc8d]
./build/bin/llama-server(+0x4af3d)[0x55bff9524f3d]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fb495928d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fb495928e40]
./build/bin/llama-server(+0x4c995)[0x55bff9526995]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unexpected empty grammar stack after accepting piece:
<tool_call>
Aborted (core dumped)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions