Benchmarking agents in the context of full tools, that do not live under A2A. #1
Replies: 1 comment 1 reply
-
|
Hi Barry, thanks for the question. I'm not sure if I complelely get the second part of the question but here are some relevant thoughts:
I'm not sure if any of these thoughts are directly relevant to your question, but happy to follow up if you have further thoughts or clarifications. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Often, we wish to benchmark a set of related agents in the context of a "higher-level" tools, such as gemini-cli or warp. Since these tools are not A2A agents, how would one fit that type of benchmarking with AgentBeats?
Say I write an agent that helps with a particular aspect of code development. For example, an agent that "knows CUDA programming tricks for high performance". Say this agent doesn't know how to compile CUDA code or run it so I would like an accessor that evaluates using this agent with, for example, gemini-cli, warp, or similar tools (with a variety of different LLMs).
What are everyone's thoughts on how to fix this into AgentBeats. For example, wrap gemini-cli and Warp as A2A agents? Some other approach?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions