-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Deepseek V3.1 thinking mode is the default #15533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Deepseek V3.1 thinking mode is the default #15533
Conversation
- Added COMMON_CHAT_FORMAT_DEEPSEEK_V3_1 enum value - Created common_chat_params_init_deepseek_v3_1() function (currently uses R1 implementation) - Created common_chat_parse_deepseek_v3_1() function that handles V3.1 thinking format: - Extracts reasoning content before '</think>' tag into reasoning_content - Extracts regular content after '</think>' tag into content - No opening '<think>' tag in V3.1 format - Added detection logic for V3.1 templates based on pattern: 'message['prefix'] is defined and message['prefix'] and thinking' - Added V3.1 case to parsing switch statement This addresses the issue where V3.1 outputs reasoning content followed by '</think>' and then regular content without the opening '<think>' tag.
This reverts commit c50d887.
data.thinking_forced_open = true; | ||
} | ||
return data; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize you probably didn't want to tackle tool calls in this PR, but I'm pretty sure this will break when there are tool calls if you don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC I had ChatGPT 5 generate a python script for a simple tool call test: https://github.com/createthis/llama_cpp_deepseek_v3_1_think_tags/blob/main/tool_test.py
This works for me with gpt-oss-120b
and master
. We can use this as our control test.
I recorded the successful conversations with tool_test.py
and gpt-oss-120b
twice, once without --verbose
set in llama.cpp
and once with. The files for the recordings are in the same repo:
- full_traffic_master_tool_test_gpt_oss_120b.mitm
- full_traffic_master_tool_test_verbose_gpt_oss_120b.mitm
When the test works, the output looks like this:
(base) jesse@Jesses-MacBook-Pro llama.cpp % python ../tool_test.py
ASSISTANT:
The current time in Tokyo is **2025‑08‑24 22:41 JST**.
(base) jesse@Jesses-MacBook-Pro llama.cpp %
Tool calls work in thinking and non-thinking modes. However, I've introduced a regression in streaming mode where reasoning content initially comes through as regular content. I need to think about how to deal with this long term.
Also, address this CI failure: |
function per CISC's request.
|
common/chat.cpp
Outdated
* Takes a prefix regex that must have 1 group to capture the function name, a closing suffix, and expects json parameters in between. | ||
* Aggregates the prefix, suffix and in-between text into the content. | ||
*/ | ||
static void parse_json_tool_calls_deepseek_v3_1( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure why you made this function instead of changing parse_json_tool_calls
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to break anything existing by altering current behavior accidentally. I don't think there are a lot of unit tests for this function. I'm happy to just replace it though if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I know if I replace it, it breaks Functionary v3.2 tests at a minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged them using an optional update_cursor
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC would you please re-review? I just finished running the Aider Polyglot Benchmark with Q2_K_XL in thinking mode. This branch seems to be performing pretty well.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@createthis When I tested this branch a couple of days ago with llama-server
and the integrated web frontend, I wasn't able to get any response out of the model: it would apparently "think", but no response would be shown at all (only the busy indicator). Is this something that was to be expected and would now be fixed before the merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgoll I don't use the built-in webui. I use Open WebUI, which works fine. I just tested the built-in webui and I'm seeing the same behavior. Thanks for the report. I'll try to figure out why that's happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgoll It works in the builtin webui with --reasoning-budget 0
. That's something at least. Still investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
behaviors by adding optional update_cursor argument.
@@ -678,7 +679,8 @@ static void parse_json_tool_calls( | |||
const common_regex & close_regex, | |||
const std::optional<common_regex> & block_close, | |||
bool allow_raw_python = false, | |||
const std::function<std::string(const common_chat_msg_parser::find_regex_result & fres)> & get_function_name = nullptr) { | |||
const std::function<std::string(const common_chat_msg_parser::find_regex_result & fres)> & get_function_name = nullptr, | |||
bool update_cursor = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to understand why the update_cursor
changes are necessary, why exactly do you need to do this?
I see that the tests fail without it, but the tests are also altered compared to all the other formats...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the tests fail without it
@CISC Correct. The regexes simply do not work without this change. It's good to see my unit tests are doing their job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why do they require this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I'm getting at is that this looks like shaping the result to fit the (possibly incorrect) test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC What's incorrect about the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I understand that update_cursor
allows you to process multiple tool calls (which BTW, should be applicable to several models, including R1), however the first few tests are single tool calls, and they fail when update_cursor
is false
because the end tag is left unconsumed, thus failing to parse as a tool call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC I'm not super smart. Dumb it down for me. Are you asking for a change? Is a specific line incorrect? Give me something to go on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't mean to be obtuse, my worry is that something is not quite right when the tool call fails to be parsed if I force update_cursor
to false
, intuitively this should work for single tool calls, but doesn't.
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
tool_rules.push_back(builder.add_rule(name + "-call", | ||
"( \"<|tool▁call▁begin|>\" )? \"function<|tool▁sep|>" + name + "\\n" | ||
"```json\\n\" " + builder.add_schema(name + "-args", parameters) + " " | ||
"\"```<|tool▁call▁end|>\"")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not match function_regex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was cargo culted from the R1 code. I see the tests fail with it removed. I'll investigate. I have to cook some swordfish for my daughter first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I logged this out:
std::string rule_name = name + "-call";
std::string rule_pattern = "( \"<|tool▁call▁begin|>\" )? \"function<|tool▁sep|>" + name + "\\n"
"```json\\n\" " + builder.add_schema(name + "-args", parameters) + " "
"\"```<|tool▁call▁end|>\"";
LOG_DBG("%s: add_rule: \nrule_name: %s\nrule_pattern: %s\n", __func__, rule_name.c_str(), rule_pattern.c_str());
This logs
operator(): add_rule:
rule_name: special_function-call
rule_pattern: ( "<|tool▁call▁begin|>" )? "function<|tool▁sep|>special_function\n```json\n" special-function-args "```<|tool▁call▁end|>"
function_regex
function_regex is (?:<|tool▁call▁begin|>)?function<|tool▁sep|>([^\n]+)\n```json\n
testing function_regex
Here's a link to a regex tester that implements this pattern: https://regex101.com/r/WUCUga/1
And a screenshot of the tester for good measure:
???
Admittedly, I don't fully understand llama.cpp's grammar subsystem, but this looks like it works to me. Also, if I remove this code the tests fail. Help me out. What are we talking about here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails on this test:
Lines 1796 to 1798 in 7795594
simple_assist_msg("", "", "get_time", "{\"city\":\"Tokyo\"}"), | |
common_chat_parse( | |
"<|tool▁calls▁begin|><|tool▁call▁begin|>get_time<|tool▁sep|>{\"city\": \"Tokyo\"}<|tool▁call▁end|><|tool▁calls▁end|>", |
Edit: To be clear, the regex works, but the rule pattern does not match the regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CISC No, it fails on the first test_templates
call, which was cargo culted from the R1 tests:
Line 1327 in 7795594
test_templates(tmpls.get(), end_tokens, message_assist, tools, "Hello, world!\nWhat's up?", /* expect_grammar_triggered= */ false); |
changes to repro
diff --git a/common/chat.cpp b/common/chat.cpp
index 1236e766..113eec90 100644
--- a/common/chat.cpp
+++ b/common/chat.cpp
@@ -1365,10 +1365,6 @@ static common_chat_params common_chat_params_init_deepseek_v3_1(const common_cha
std::string name = function.at("name");
auto parameters = function.at("parameters");
builder.resolve_refs(parameters);
- tool_rules.push_back(builder.add_rule(name + "-call",
- "( \"<|tool▁call▁begin|>\" )? \"function<|tool▁sep|>" + name + "\\n"
- "```json\\n\" " + builder.add_schema(name + "-args", parameters) + " "
- "\"```<|tool▁call▁end|>\""));
});
// Distill Qwen 7B & 32B models seem confused re/ syntax of their tool call opening tag,
// so we accept common variants (then it's all constrained)
diff --git a/tests/test-chat.cpp b/tests/test-chat.cpp
index 4bcbf97a..42d79508 100644
--- a/tests/test-chat.cpp
+++ b/tests/test-chat.cpp
@@ -1755,6 +1755,7 @@ static void test_template_output_parsers() {
/* is_partial= */ false,
{COMMON_CHAT_FORMAT_SEED_OSS}));
}
+ common_log_set_verbosity_thold(LOG_DEFAULT_DEBUG);
{
auto tmpls = read_templates("models/templates/deepseek-ai-DeepSeek-V3.1.jinja");
std::vector<std::string> end_tokens{ "<|end▁of▁sentence|>" };
@@ -1765,8 +1766,11 @@ static void test_template_output_parsers() {
assert_equals(true, params.thinking_forced_open);
}
+ LOG_DBG("%s: here0\n", __func__);
test_templates(tmpls.get(), end_tokens, message_assist, tools, "</think>Hello, world!\nWhat's up?", /* expect_grammar_triggered= */ false);
+ LOG_DBG("%s: here0.1\n", __func__);
test_templates(tmpls.get(), end_tokens, message_assist_thoughts, tools, "</think>Hello, world!\nWhat's up?", /* expect_grammar_triggered= */ false);
+ LOG_DBG("%s: here0.2\n", __func__);
assert_msg_equals(
simple_assist_msg("Hello, world!\nWhat's up?", "I'm\nthinking"),
common_chat_parse(
result
Note the here0
but no here0.1
:
# Reading: models/templates/deepseek-ai-DeepSeek-V3.1.jinja
Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template.
test_template_output_parsers: here0
Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template.
Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template.
unsupported grammar, left recursion detected for nonterminal at index 3/home/jesse/llama.cpp/build-ci-debug/bin/libggml-base.so(+0x51511)[0x7a96c41c2511]
analysis
I don't know why tools
is using python:
Line 213 in 7795594
std::vector<common_chat_tool> tools { special_function_tool, python_tool }; |
I'm just trying to get V3.1 working, not fix the broken R1 code.
This PR enables DeepSeek V3.1 thinking mode as the default. Disable with
--reasoning-budget 0
.Addresses #15496