Skip to content

[Evaluation] Add DiscoveryBench Benchmark #4

@suranah

Description

@suranah

What problem or use case are you trying to solve?

Add DiscoveryBench to OpenHands' evaluation suite. DiscoveryBench contains 264 tasks collected
across 6 diverse domains, such as biology, economics, and sociology. It incorporates discovery workflows from published papers to approximate the real-world challenges faced by researchers.

https://github.com/allenai/discoverybench/
https://x.com/mbodhisattwa/status/1811524569410531333

Do you have thoughts on the technical implementation?

The implementation will consist of:

  1. Inference script to solve a DiscoveryBench task (goal & datasets)
  2. Facetted evaluation script to rigorously evaluate the answers
  3. Documentation for the OpenHands users

Additional context

We are working on a PR for this and will seek OpenHands contributors' input to finalize it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions