Deepseek V3.1 thinking mode is the default #15533

createthis · 2025-08-23T21:22:33Z

This PR enables DeepSeek V3.1 thinking mode as the default. Disable with --reasoning-budget 0.

Addresses #15496

- Added COMMON_CHAT_FORMAT_DEEPSEEK_V3_1 enum value - Created common_chat_params_init_deepseek_v3_1() function (currently uses R1 implementation) - Created common_chat_parse_deepseek_v3_1() function that handles V3.1 thinking format: - Extracts reasoning content before '</think>' tag into reasoning_content - Extracts regular content after '</think>' tag into content - No opening '<think>' tag in V3.1 format - Added detection logic for V3.1 templates based on pattern: 'message['prefix'] is defined and message['prefix'] and thinking' - Added V3.1 case to parsing switch statement This addresses the issue where V3.1 outputs reasoning content followed by '</think>' and then regular content without the opening '<think>' tag.

This reverts commit c50d887.

chat parser.

…budget 0`.

CISC · 2025-08-24T07:06:50Z

common/chat.cpp

+        data.thinking_forced_open = true;
+    }
+    return data;
+}


I realize you probably didn't want to tackle tool calls in this PR, but I'm pretty sure this will break when there are tool calls if you don't.

@CISC I had ChatGPT 5 generate a python script for a simple tool call test: https://github.com/createthis/llama_cpp_deepseek_v3_1_think_tags/blob/main/tool_test.py

This works for me with gpt-oss-120b and master. We can use this as our control test.

I recorded the successful conversations with tool_test.py and gpt-oss-120b twice, once without --verbose set in llama.cpp and once with. The files for the recordings are in the same repo:

full_traffic_master_tool_test_gpt_oss_120b.mitm

full_traffic_master_tool_test_verbose_gpt_oss_120b.mitm

When the test works, the output looks like this:

(base) jesse@Jesses-MacBook-Pro llama.cpp % python ../tool_test.py ASSISTANT: The current time in Tokyo is **2025‑08‑24 22:41 JST**. (base) jesse@Jesses-MacBook-Pro llama.cpp %

tests/.gitignore

tests/test-chat-parser.cpp

…ll variants.

Tool calls work in thinking and non-thinking modes. However, I've introduced a regression in streaming mode where reasoning content initially comes through as regular content. I need to think about how to deal with this long term.

test case.

CISC · 2025-08-29T20:49:18Z

Also, address this CI failure:
https://github.com/ggml-org/llama.cpp/actions/runs/17274346584/job/49214116367

function per CISC's request.

createthis · 2025-08-29T20:54:44Z

Also, address this CI failure: https://github.com/ggml-org/llama.cpp/actions/runs/17274346584/job/49214116367

@CISC you got it 9056707

CISC · 2025-08-29T21:31:04Z

common/chat.cpp

+ * Takes a prefix regex that must have 1 group to capture the function name, a closing suffix, and expects json parameters in between.
+ * Aggregates the prefix, suffix and in-between text into the content.
+ */
+static void parse_json_tool_calls_deepseek_v3_1(


I'm not entirely sure why you made this function instead of changing parse_json_tool_calls?

I didn't want to break anything existing by altering current behavior accidentally. I don't think there are a lot of unit tests for this function. I'm happy to just replace it though if you want.

Now I know if I replace it, it breaks Functionary v3.2 tests at a minimum.

I merged them using an optional update_cursor argument.

@CISC would you please re-review? I just finished running the Aider Polyglot Benchmark with Q2_K_XL in thinking mode. This branch seems to be performing pretty well.

@createthis When I tested this branch a couple of days ago with llama-server and the integrated web frontend, I wasn't able to get any response out of the model: it would apparently "think", but no response would be shown at all (only the busy indicator). Is this something that was to be expected and would now be fixed before the merge?

@sgoll I don't use the built-in webui. I use Open WebUI, which works fine. I just tested the built-in webui and I'm seeing the same behavior. Thanks for the report. I'll try to figure out why that's happening.

@sgoll It works in the builtin webui with --reasoning-budget 0. That's something at least. Still investigating.

@sgoll The builtin webui uses reasoning_format: none, but I wasn't supporting it. This should be fixed by 7795594.

Note that reasoning_format: none is pretty basic. Open WebUI formats it better because it uses reasoning_format: auto:

tests/test-chat-parser.cpp

behaviors by adding optional update_cursor argument.

tests/test-chat-parser.cpp

CISC · 2025-08-31T18:44:16Z

common/chat.cpp

@@ -678,7 +679,8 @@ static void parse_json_tool_calls(
    const common_regex & close_regex,
    const std::optional<common_regex> & block_close,
    bool allow_raw_python = false,
-    const std::function<std::string(const common_chat_msg_parser::find_regex_result & fres)> & get_function_name = nullptr) {
+    const std::function<std::string(const common_chat_msg_parser::find_regex_result & fres)> & get_function_name = nullptr,
+    bool update_cursor = false) {


I'm struggling to understand why the update_cursor changes are necessary, why exactly do you need to do this?

I see that the tests fail without it, but the tests are also altered compared to all the other formats...

I see that the tests fail without it

@CISC Correct. The regexes simply do not work without this change. It's good to see my unit tests are doing their job.

But why do they require this change?

What I'm getting at is that this looks like shaping the result to fit the (possibly incorrect) test.

@CISC What's incorrect about the test?

Well, I understand that update_cursor allows you to process multiple tool calls (which BTW, should be applicable to several models, including R1), however the first few tests are single tool calls, and they fail when update_cursor is false because the end tag is left unconsumed, thus failing to parse as a tool call.

@CISC I'm not super smart. Dumb it down for me. Are you asking for a change? Is a specific line incorrect? Give me something to go on here.

Sorry, I didn't mean to be obtuse, my worry is that something is not quite right when the tool call fails to be parsed if I force update_cursor to false, intuitively this should work for single tool calls, but doesn't.

Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC · 2025-08-31T21:06:00Z

common/chat.cpp

+                tool_rules.push_back(builder.add_rule(name + "-call",
+                    "( \"<｜tool▁call▁begin｜>\" )? \"function<｜tool▁sep｜>" + name + "\\n"
+                    "```json\\n\" " + builder.add_schema(name + "-args", parameters) + " "
+                    "\"```<｜tool▁call▁end｜>\""));


This does not match function_regex.

This code was cargo culted from the R1 code. I see the tests fail with it removed. I'll investigate. I have to cook some swordfish for my daughter first.

I logged this out:

std::string rule_name = name + "-call"; std::string rule_pattern = "( \"<｜tool▁call▁begin｜>\" )? \"function<｜tool▁sep｜>" + name + "\\n" "```json\\n\" " + builder.add_schema(name + "-args", parameters) + " " "\"```<｜tool▁call▁end｜>\""; LOG_DBG("%s: add_rule: \nrule_name: %s\nrule_pattern: %s\n", __func__, rule_name.c_str(), rule_pattern.c_str());

This logs

operator(): add_rule:
rule_name: special_function-call
rule_pattern: ( "<｜tool▁call▁begin｜>" )? "function<｜tool▁sep｜>special_function\n```json\n" special-function-args "```<｜tool▁call▁end｜>"

function_regex

function_regex is (?:<｜tool▁call▁begin｜>)?function<｜tool▁sep｜>([^\n]+)\n```json\n

testing function_regex

Here's a link to a regex tester that implements this pattern: https://regex101.com/r/WUCUga/1
And a screenshot of the tester for good measure:

???

Admittedly, I don't fully understand llama.cpp's grammar subsystem, but this looks like it works to me. Also, if I remove this code the tests fail. Help me out. What are we talking about here?

It fails on this test:

llama.cpp/tests/test-chat.cpp

Lines 1796 to 1798 in 7795594

simple_assist_msg("", "", "get_time", "{\"city\":\"Tokyo\"}"),

common_chat_parse(

"<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>get_time<｜tool▁sep｜>{\"city\": \"Tokyo\"}<｜tool▁call▁end｜><｜tool▁calls▁end｜>",

Edit: To be clear, the regex works, but the rule pattern does not match the regex.

@CISC No, it fails on the first test_templates call, which was cargo culted from the R1 tests:

llama.cpp/tests/test-chat.cpp

Line 1327 in 7795594

test_templates(tmpls.get(), end_tokens, message_assist, tools, "Hello, world!\nWhat's up?", /* expect_grammar_triggered= */ false);

changes to repro

diff --git a/common/chat.cpp b/common/chat.cpp index 1236e766..113eec90 100644 --- a/common/chat.cpp +++ b/common/chat.cpp @@ -1365,10 +1365,6 @@ static common_chat_params common_chat_params_init_deepseek_v3_1(const common_cha std::string name = function.at("name"); auto parameters = function.at("parameters"); builder.resolve_refs(parameters); - tool_rules.push_back(builder.add_rule(name + "-call", - "( \"<｜tool▁call▁begin｜>\" )? \"function<｜tool▁sep｜>" + name + "\\n" - "```json\\n\" " + builder.add_schema(name + "-args", parameters) + " " - "\"```<｜tool▁call▁end｜>\"")); }); // Distill Qwen 7B & 32B models seem confused re/ syntax of their tool call opening tag, // so we accept common variants (then it's all constrained) diff --git a/tests/test-chat.cpp b/tests/test-chat.cpp index 4bcbf97a..42d79508 100644 --- a/tests/test-chat.cpp +++ b/tests/test-chat.cpp @@ -1755,6 +1755,7 @@ static void test_template_output_parsers() { /* is_partial= */ false, {COMMON_CHAT_FORMAT_SEED_OSS})); } + common_log_set_verbosity_thold(LOG_DEFAULT_DEBUG); { auto tmpls = read_templates("models/templates/deepseek-ai-DeepSeek-V3.1.jinja"); std::vector<std::string> end_tokens{ "<｜end▁of▁sentence｜>" }; @@ -1765,8 +1766,11 @@ static void test_template_output_parsers() { assert_equals(true, params.thinking_forced_open); } + LOG_DBG("%s: here0\n", __func__); test_templates(tmpls.get(), end_tokens, message_assist, tools, "</think>Hello, world!\nWhat's up?", /* expect_grammar_triggered= */ false); + LOG_DBG("%s: here0.1\n", __func__); test_templates(tmpls.get(), end_tokens, message_assist_thoughts, tools, "</think>Hello, world!\nWhat's up?", /* expect_grammar_triggered= */ false); + LOG_DBG("%s: here0.2\n", __func__); assert_msg_equals( simple_assist_msg("Hello, world!\nWhat's up?", "I'm\nthinking"), common_chat_parse(

result

Note the here0 but no here0.1:

# Reading: models/templates/deepseek-ai-DeepSeek-V3.1.jinja Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template. test_template_output_parsers: here0 Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template. Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template. unsupported grammar, left recursion detected for nonterminal at index 3/home/jesse/llama.cpp/build-ci-debug/bin/libggml-base.so(+0x51511)[0x7a96c41c2511]

analysis

I don't know why tools is using python:

llama.cpp/tests/test-chat.cpp

Line 213 in 7795594

std::vector<common_chat_tool> tools { special_function_tool, python_tool };

I'm just trying to get V3.1 working, not fix the broken R1 code.

openhands-agent and others added 14 commits August 22, 2025 13:31

Another attempt by V3.1 non-thinking

3912fd3

Fix test, but it's not asserting anything.

bac6c99

Ignore vim swap files in tests dir

fe86282

Update the test

3d00d62

Try using try_find_literal instead of regex

c50d887

passing test

3f319aa

Revert "Try using try_find_literal instead of regex"

79f7ca3

This reverts commit c50d887.

Remove unnecessary change

0d959ba

Remove comment

6223c1c

Add code to handle non-thinking mode.

0d372f4

Try to set message['prefix'] when thinking is enabled.

f0da116

This fixes reasoning, but breaks normal content. We need state in the

56f7e38

chat parser.

DeepSeek V3.1 thinking is now the default. Disable with `--reasoning-…

f4f0ddb

…budget 0`.

github-actions bot added the testing Everything test related label Aug 23, 2025

createthis mentioned this pull request Aug 23, 2025

Feature Request: Add DeepSeek-V3.1 #15496

Open

4 tasks

Simplify (DeepSeek V3.1 reasoning)

f7d2ee9

CISC reviewed Aug 24, 2025

View reviewed changes

Fix sign inversion bug

7ac92ca

createthis requested a review from ngxson as a code owner August 25, 2025 01:50

github-actions bot added examples server labels Aug 25, 2025

createthis marked this pull request as draft August 25, 2025 03:34

createthis added 2 commits August 25, 2025 04:43

Add some tool calling code (not working).

be0b2b8

Tool calls working in non-reasoning mode.

776d95b

createthis marked this pull request as ready for review August 25, 2025 05:42

createthis added 4 commits August 25, 2025 10:25

Attempt a unit test for tool call parsing.

a32cad1

Passing test

52d5488

Add tests for both happy path and broken fenced DeepSeek V3.1 tool ca…

a839be7

…ll variants.

Passing DeepSeek V3.1 tool call tests, but model is not working.

6ade60e

createthis added 5 commits August 25, 2025 22:52

Fix thinking_forced_open logic. tool calling broken. Need to add another

ab22c76

test case.

That's what I get for cargo culting a newline.

4a2d17d

Add multi tool call test for deepseek v3.1 non-reasoning

b2d57ce

Merge branch 'master' into deepseek_3_1_thinking_mode

f422fe7

createthis requested a review from CISC August 29, 2025 18:22

Move test, remove .gitignore change

7dc19e8

createthis added 2 commits August 29, 2025 20:51

Place deepseek-v3.1 reasoning test directly into existing reasoning

380146e

function per CISC's request.

Address whitespace CI failure.

9056707

CISC reviewed Aug 29, 2025

View reviewed changes

tests/test-chat-parser.cpp Show resolved Hide resolved

createthis added 2 commits August 30, 2025 03:28

Merge two assert_equals per CISC's request.

a406d6a

Add DeepSeek-V3.1 tests to tests/test-chat.cpp per CISC's request.

ec984da

createthis requested a review from CISC August 30, 2025 04:57

createthis added 2 commits August 30, 2025 05:02

Merge branch 'master' into deepseek_3_1_thinking_mode

92003d7

Merge deepseek V3.1 and regular parse_json_tool_calls() function

f661dbe

behaviors by adding optional update_cursor argument.

CISC requested changes Aug 31, 2025

View reviewed changes

createthis and others added 9 commits August 31, 2025 16:12

Update tests/test-chat-parser.cpp

12b013f

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-chat-parser.cpp

800af00

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-chat-parser.cpp

80a7e1c

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-chat-parser.cpp

155852a

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-chat-parser.cpp

e587808

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-chat-parser.cpp

ac6ed1e

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-chat-parser.cpp

3843d94

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-chat-parser.cpp

6773708

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-chat-parser.cpp

befa31c

Co-authored-by: Sigbjørn Skjæret <[email protected]>

CISC reviewed Aug 31, 2025

View reviewed changes

DeepSeek V3.1 fix reasoning_format none

7795594

	simple_assist_msg("", "", "get_time", "{\"city\":\"Tokyo\"}"),
	common_chat_parse(
	"<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>get_time<｜tool▁sep｜>{\"city\": \"Tokyo\"}<｜tool▁call▁end｜><｜tool▁calls▁end｜>",

Deepseek V3.1 thinking mode is the default #15533

Are you sure you want to change the base?

Deepseek V3.1 thinking mode is the default #15533

Conversation

createthis commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

createthis Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CISC commented Aug 29, 2025

Uh oh!

createthis commented Aug 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

createthis Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CISC Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

This logs

function_regex

testing function_regex

???

Uh oh!

createthis commented Aug 23, 2025 •

edited

Loading

createthis Aug 24, 2025 •

edited

Loading

createthis Aug 31, 2025 •

edited

Loading

CISC Aug 31, 2025 •

edited

Loading

CISC Sep 1, 2025 •

edited

Loading

createthis Sep 1, 2025 •

edited

Loading