Misc. bug: Tool calling CRASH : Unexpected empty grammar stack after accepting piece<tool_call>

### Name and Version


ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 CUDA devices:
  Device 0: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
  Device 1: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
  Device 2: NVIDIA A100-PCIE-40GB, compute capability 8.0, VMM: yes
version: 6294 (bcbddcd54)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu

### Operating systems

Linux

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
./build/bin/llama-server -ngl 999 -fa -m GLM-4.5-Air-Q5_K_M-00001-of-00002.gguf -c 131071 --jinja --reasoning-format deepseek --slots --n-predict 131071 --no-context-shift -ctk q8_0 -ctv q8_0 --parallel 1


GGufs are here : `https://huggingface.co/unsloth/GLM-4.5-Air-GGUF/tree/main/Q5_K_M`
```

### Problem description & steps to reproduce

When using tool calling, sometime server crash (see bellow for stacktrace).

I first met this issue when working on this PR : https://github.com/ggml-org/llama.cpp/pull/15248

But I found a way to **100%** replicate this error on master:head ... but the model is big.

When trying with smaller examples I do not meet this issue (simple tool calling example provided in llama.cpp documentation)


Here is a .js that needs to be ran with `node nodetest.js` that triggers the crash every single times it's called.

[nodetest.js](https://github.com/user-attachments/files/22002157/nodetest.js)


### First Bad Commit

_No response_

### Relevant log output

```shell
srv  params_from_: Chat format: Hermes 2 Pro
slot launch_slot_: id  0 | task 0 | processing task
slot update_slots: id  0 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 4244
slot update_slots: id  0 | task 0 | kv cache rm [0, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 2048, n_tokens = 2048, progress = 0.482564
slot update_slots: id  0 | task 0 | kv cache rm [2048, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 4096, n_tokens = 2048, progress = 0.965127
slot update_slots: id  0 | task 0 | kv cache rm [4096, end)
slot update_slots: id  0 | task 0 | prompt processing progress, n_past = 4244, n_tokens = 148, progress = 1.000000
slot update_slots: id  0 | task 0 | prompt done, n_past = 4244, n_tokens = 148
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libggml-base.so(+0x16dab)[0x7fb495e8ddab]
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libggml-base.so(ggml_print_backtrace+0x21f)[0x7fb495e8e20f]
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libggml-base.so(+0x2975f)[0x7fb495ea075f]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7fb495cdf20c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae277)[0x7fb495cdf277]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae4d8)[0x7fb495cdf4d8]
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libllama.so(+0x65f61)[0x7fb495f89f61]
/home/xxxxxxxxxx/idextend/llama.cpp/build/bin/libllama.so(_Z25llama_grammar_accept_implR13llama_grammari+0x26f)[0x7fb495fcd81f]
./build/bin/llama-server(+0x1e08ac)[0x55bff96ba8ac]
./build/bin/llama-server(+0xdfb13)[0x55bff95b9b13]
./build/bin/llama-server(+0x83c8d)[0x55bff955dc8d]
./build/bin/llama-server(+0x4af3d)[0x55bff9524f3d]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fb495928d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fb495928e40]
./build/bin/llama-server(+0x4c995)[0x55bff9526995]
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unexpected empty grammar stack after accepting piece:
<tool_call>
Aborted (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Tool calling CRASH : Unexpected empty grammar stack after accepting piece<tool_call> #15608

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Tool calling CRASH : Unexpected empty grammar stack after accepting piece<tool_call> #15608

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions