Skip to content

Conversation

@manasa-manoj-nbr
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The DataFusion project has an extensive benchmarking infrastructure with many different benchmark types (TPCH, ClickBench, IMDB, H2O.ai, micro-benchmarks, etc.) scattered across README files and code comments. Contributors need a centralized, easily discoverable resource to understand what benchmarks are available, how to use them for validating performance changes, and where to add new benchmark code. This addresses the maintainer's request in issue #17811 to create a dedicated documentation page describing all the benchmark code we have.

What changes are included in this PR?

  • Created docs/source/contributor-guide/benchmarking.md: A comprehensive documentation page covering all DataFusion benchmarks, organized by categories (Performance Benchmarks, Specialized Benchmarks, Micro-benchmarks)
  • Updated docs/source/index.rst: Added the new benchmarking page to the Contributor Guide navigation structure
  • Updated docs/source/contributor-guide/testing.md: Added cross-reference to the new dedicated benchmarking page in the existing benchmarks section

The new documentation consolidates information about:

  • All major benchmark suites (TPCH, ClickBench, IMDB, H2O.ai, Sort, External Aggregation, etc.)
  • Usage instructions for bench.sh script and dfbench binary
  • Configuration options and environment variables
  • Guidelines for adding new benchmarks
  • Troubleshooting common issues

Are these changes tested?

  • Documentation builds successfully without warnings or errors
  • Navigation structure tested - new page appears correctly in Contributor Guide menu
  • Internal links verified - all cross-references and links work properly
  • Content accuracy verified - all benchmark information sourced from official /benchmarks/README.md and existing documentation

Are there any user-facing changes?

No Breaking Changes:
- No changes to APIs, CLIs, or runtime behavior
- No changes to existing benchmark functionality
- Purely additive documentation enhancement

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 29, 2025
@2010YOUY01
Copy link
Contributor

Thank you for the contribution.

The original issue is suggesting to add a contributor guide page for micro benchmarks scattered in the codebase, and this PR is for end-to-end benchmarks, we already have a doc for them https://github.com/apache/datafusion/blob/main/benchmarks/README.md
I think it's a good idea to move it to the contributor guide, and we don't have to generate a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a page to describe the bench code we have.

2 participants