forked from OpenHands/OpenHands
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
What problem or use case are you trying to solve?
Add DiscoveryBench to OpenHands' evaluation suite. DiscoveryBench contains 264 tasks collected
across 6 diverse domains, such as biology, economics, and sociology. It incorporates discovery workflows from published papers to approximate the real-world challenges faced by researchers.
https://github.com/allenai/discoverybench/
https://x.com/mbodhisattwa/status/1811524569410531333
Do you have thoughts on the technical implementation?
The implementation will consist of:
- Inference script to solve a DiscoveryBench task (goal & datasets)
- Facetted evaluation script to rigorously evaluate the answers
- Documentation for the OpenHands users
Additional context
We are working on a PR for this and will seek OpenHands contributors' input to finalize it.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request