-
Notifications
You must be signed in to change notification settings - Fork 2k
[WIP] React use dspy.ToolCalls
#8472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] React use dspy.ToolCalls
#8472
Conversation
@chenmoneygithub not 100% sure, but the regression you see may be due to how the message history and tool call history/results are passed back to the model provider. It looks like the trajectory right now is still formatted as a string, but you may get better results if you build and pass a full message stack back to the model provider with the right content types for tool call and tool call results. Have you tried this? |
@ryanh-ai Thanks for the suggestion! I assume you mean formatting the trajectory as multiturn message instead of a big json? I have tried it, but doesn't produce meaningful improvements. I have kinda of spotted the problem, it's the LM is doing a worse job of understanding nested type requirements, like |
I mean passing tool results back as tool result content blocks as part of user the, same for assistant turns text content, etc - perhaps that is what your experiment was, but wanted to be clear. Here is the page in LiteLLM: https://docs.litellm.ai/docs/completion/function_call I know some model providers use the structured message schema and content types in the way they format what the LLM sees. |
@ryanh-ai Thanks! So you mean native function calling. We have noticed that native function calling yields a very poor quality, if you are interested in helping us improve DSPy, I would like to get your experiment result on how it works. Thank you! |
Sounds good! I have not done it with DSPy but was trying to hence coming across this thread. I have done it outside DSPy on a couple of providers. Let me see if I can implement in DSPy when I find some time for a test. |
Refactor dspy.ReAct to use the dspy.ToolCalls for consistency.
We are keeping the behavior that when JSONAdapter is used with ToolCalls, we direct it to the ChatAdapter for good quality, because we have been consistently noticing that the models are doing poorly when using structured output +
dict[str, Any]
. With our experiments, native tool calling can mitigate this issue, but native tool calling is not producing promising results now, and we are still doing experiments there.Did a quick benchmark on Hover dataset for this PR, and we see a pretty clear quality regression:
My theory is all these LMs are doing a worse job of understanding deep nested output type than flat types. In details,
dspy.ToolCalls
is a nest type that has a list ofdspy.Toolcalls.ToolCall
, which has two fields, one is a string representing the name, and the other representing the arg dict. As a comparison, the current react usesnext_tool_name
, which is a single string, andnext_tool_args
which is a dict. So this PR is introducing too much nesting to keep LM of a decent quality.All benchmarks are done on 50 data from Hover dataset, with ChatAdapter. Benchmark script: