Skip to content

Conversation

@chaunceyjiang
Copy link
Collaborator

@chaunceyjiang chaunceyjiang commented Dec 2, 2025

Purpose

vllm serve  /mnt/data4/models/deepseek-ai/DeepSeek-V3___2 --port 8000 --tensor-parallel-size 8 --tokenizer-mode deepseek_v32

Test Plan

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

res = client.chat.completions.create(
    model=client.models.list().data[0].id,
    messages=[
        {"role": "user", "content": "hello"},
        {
            "role": "assistant",
            "content": "Hello! I am DeepSeek.",
            "reasoning": "thinking...",
        },
        {"role": "user", "content": "1+1=?"},
    ],
)

print(res)

Result


ChatCompletion(id='chatcmpl-945582535cd19223', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Hmm, the user asked "1+1=?" after the initial greeting. This is a very basic arithmetic question. \n\nThe answer is straightforward: 1+1 equals 2. No need for complex reasoning or explanations here. \n\nI can simply provide the answer directly since the user likely just wants confirmation. Adding an emoji might make the response feel friendlier.1 + 1 = 2', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning=None, reasoning_content=None), stop_reason=None, token_ids=None)], created=1764658071, model='/mnt/data4/models/deepseek-ai/DeepSeek-V3___2', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=85, prompt_tokens=21, total_tokens=106, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)


Test tool call



from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="fake")
properties = {
    "city": {
        "type": "string",
        "description": "The city to find the weather for, e.g. 'San Francisco'",
    },
    "state": {
        "type": "string",
        "description": "the two-letter abbreviation for the state that the city is"
        " in, e.g. 'CA' which would mean 'California'",
    },
    "unit": {
        "type": "string",
        "description": "The unit to fetch the temperature in",
        "enum": ["celsius", "fahrenheit"],
    },
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    },
]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": properties,
                "required": ["city", "state", "unit"],
            },
        },
    }
]

messages = [
    {"role": "user", "content": "Hi! How are you doing today?"},
    {"role": "assistant", "content": "I'm doing well! How can I help you?"},
    {
        "role": "user",
        "content": (
            "Can you tell me what the temperate will be in Dallas, in fahrenheit?"
        ),
    },
]


response = client.chat.completions.create(
    model="",
    messages=messages,
    tools=tools,
    tool_choice="auto",
   extra_body={"chat_template_kwargs": {"thinking": True}},
)
print(response)

Result


ChatCompletion(id='chatcmpl-bd7b307e7a534bb7', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The user wants to know the temperature in Dallas in Fahrenheit. I need to use the get_current_weather function. The function requires city, state, and unit parameters. Dallas is a city, but I need to know which state. Dallas is in Texas, so state is TX. Unit should be "fahrenheit". I\'ll call the function.</think>\n\n<|DSML|function_calls>\n<|DSML|invoke name="get_current_weather">\n<|DSML|parameter name="city" string="true">Dallas</|DSML|parameter>\n<|DSML|parameter name="state" string="true">TX</|DSML|parameter>\n<|DSML|parameter name="unit" string="true">fahrenheit</|DSML|parameter>\n</|DSML|invoke>\n</|DSML|function_calls>', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning=None, reasoning_content=None), stop_reason=None, token_ids=None)], created=1764693657, model='/mnt/data4/models/deepseek-ai/DeepSeek-V3___2', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=151, prompt_tokens=438, total_tokens=589, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)

Test reasoning:

vllm serve  /mnt/data4/models/deepseek-ai/DeepSeek-V3___2 --port 8000 --tensor-parallel-size 8 --tokenizer-mode deepseek_v32  --reasoning-parser deepseek_v3

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="fake")
properties = {
    "city": {
        "type": "string",
        "description": "The city to find the weather for, e.g. 'San Francisco'",
    },
    "state": {
        "type": "string",
        "description": "the two-letter abbreviation for the state that the city is"
        " in, e.g. 'CA' which would mean 'California'",
    },
    "unit": {
        "type": "string",
        "description": "The unit to fetch the temperature in",
        "enum": ["celsius", "fahrenheit"],
    },
}
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city, e.g. San Francisco, CA",
                    },
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    },
]
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather in a given location",
            "parameters": {
                "type": "object",
                "properties": properties,
                "required": ["city", "state", "unit"],
            },
        },
    }
]

messages = [
    {"role": "user", "content": "Hi! How are you doing today?"},
    {"role": "assistant", "content": "I'm doing well! How can I help you?"},
    {
        "role": "user",
        "content": (
            "Can you tell me what the temperate will be in Dallas, in fahrenheit?"
        ),
    },
]


response = client.chat.completions.create(
    model="",
    messages=messages,
    tools=tools,
    tool_choice="auto",
   extra_body={"chat_template_kwargs": {"thinking": True}},
)
print(response)

Result :


ChatCompletion(id='chatcmpl-937941f2e6a1101a', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='\n\n<|DSML|function_calls>\n<|DSML|invoke name="get_current_weather">\n<|DSML|parameter name="city" string="true">Dallas</|DSML|parameter>\n<|DSML|parameter name="state" string="true">TX</|DSML|parameter>\n<|DSML|parameter name="unit" string="true">fahrenheit</|DSML|parameter>\n</|DSML|invoke>\n</|DSML|function_calls>', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning='The user wants to know the temperature in Dallas, in Fahrenheit. I need to use the get_current_weather tool. But the tool requires city, state, and unit. They gave me Dallas, but not the state. Dallas is in Texas, so the state would be TX. And unit is fahrenheit. So I need to call the function with city: Dallas, state: TX, unit: fahrenheit. Let me do that.', reasoning_content='The user wants to know the temperature in Dallas, in Fahrenheit. I need to use the get_current_weather tool. But the tool requires city, state, and unit. They gave me Dallas, but not the state. Dallas is in Texas, so the state would be TX. And unit is fahrenheit. So I need to call the function with city: Dallas, state: TX, unit: fahrenheit. Let me do that.'), stop_reason=None, token_ids=None)], created=1764694025, model='/mnt/data4/models/deepseek-ai/DeepSeek-V3___2', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=170, prompt_tokens=438, total_tokens=608, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None, prompt_token_ids=None, kv_transfer_params=None)


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the deepseek Related to DeepSeek models label Dec 2, 2025
@mergify
Copy link

mergify bot commented Dec 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaunceyjiang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@Xu-Wenqing
Copy link
Contributor

@chaunceyjiang seems we need a new tool_call parser, are you planning to do this? if not, I can take it.

@mergify
Copy link

mergify bot commented Dec 2, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @chaunceyjiang.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Collaborator

@yeqcharlotte yeqcharlotte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add test_input..json and test_output..txt in unit tests https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Speciale/tree/main/encoding

@yeqcharlotte yeqcharlotte added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 2, 2025
@Xu-Wenqing
Copy link
Contributor

Xu-Wenqing commented Dec 2, 2025

@chaunceyjiang the function call content in this PR is:

<function_calls>\n<invoke name="get_current_weather">\n<parameter name="city" string="true">Dallas</parameter>\n<parameter name="state" string="true">TX</parameter>\n<parameter name="unit" string="true">fahrenheit</parameter>\n</invoke>\n</function_calls>

but the official format looks like

<|DSML|function_calls>
<|DSML|invoke name="get_datetime">
<|DSML|parameter name="timezone" string="true">Asia/Shanghai</|DSML|parameter>
</|DSML|invoke>
</|DSML|function_calls>

seems missing several dsml token: |DSML|

@chaunceyjiang
Copy link
Collaborator Author

seems missing several dsml token: |DSML|

@Xu-Wenqing Fixed. PTAL.

@mondaylord
Copy link

Hi, I use uv pip install vllm --pre --system --extra-index-url https://wheels.vllm.ai/nightly --extra-index-url https://download.pytorch.org/whl/cu128 --index-strategy unsafe-best-match to install the latest vllm, but the transformer_utils are not the latest code. It's still tag 0.11.2.

I also tried to use uv pip install vllm --system --torch-backend=auto --extra-index-url https://wheels.vllm.ai/5cdd66450910589c8e1a3d25e80711b0b6e51eb1, but the transformer_utils remains still, this causes the --tokenizer-mode deepseek_v32 failed with ImportError: cannot import name 'uses_xdrope_dim' from 'vllm.transformers_utils.config' (/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/config.py)

@chaunceyjiang
Copy link
Collaborator Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek Related to DeepSeek models frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants