-
Notifications
You must be signed in to change notification settings - Fork 1.7k
refactor: include metric output_batches into BaselineMetrics #18491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: include metric output_batches into BaselineMetrics #18491
Conversation
36ac5e0 to
744a4ae
Compare
744a4ae to
6272010
Compare
|
Hey @2010YOUY01, please do review once you get time. |
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this PR is well written.
This is good to go once the minor suggestions are addressed and tests pass.
|
@2010YOUY01, have done the requested changes, do check. |
|
It seems, with the https://github.com/apache/datafusion/actions/runs/19165362013/job/54785692452?pr=18491 |
|
@2010YOUY01, can you give it another go, updated the test and added baseline metrics in |
|
@2010YOUY01, it seems that #18262 will have proper baseline metrics for the Should we remove the basic |
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks again!
Should we remove the basic RepartitionExec BaselineMetrics I added here (from which we currently use the output_batches), or are we okay with having it, since once #18262 is merged, it should be fixed?
It's okay to keep it, we just merged a big change in RepartitionExec, so #18262 has to be reworked, unfortunately...
…18491) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#17027 ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> `output_batches` should be a common metric in all operators, thus should ideally be added to `BaselineMetrics` ``` > explain analyze select * from generate_series(1, 1000000) as t1(v1) order by v1 desc; +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | plan_type | plan | +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Plan with Metrics | SortExec: expr=[v1@0 DESC], preserve_partitioning=[false], metrics=[output_rows=1000000, elapsed_compute=535.320324ms, output_bytes=7.6 MB, output_batches=123, spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, batches_split=0] | | | ProjectionExec: expr=[value@0 as v1], metrics=[output_rows=1000000, elapsed_compute=208.379µs, output_bytes=7.7 MB, output_batches=123] | | | LazyMemoryExec: partitions=1, batch_generators=[generate_series: start=1, end=1000000, batch_size=8192], metrics=[output_rows=1000000, elapsed_compute=15.924291ms, output_bytes=7.7 MB, output_batches=123] | | | | +-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row(s) fetched. Elapsed 0.492 second ``` ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> - Added `output_batches` into `BaselineMetrics` with `DEV` MetricType - Tracked through `record_poll()` API - Changes are similar to apache#18268 - Refactored `assert_metrics` macro to take multiple metrics strings for substring check - Added `output_bytes` and `output_batches` tracking in `TopK` operator - Added `baseline` metrics for `RepartitionExec` ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> Added UT ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> Changes in the `EXPLAIN ANALYZE` output, `output_batches` will be added to `metrics=[...]`
Which issue does this PR close?
output_batchesintoBaselineMetrics#17027Rationale for this change
output_batchesshould be a common metric in all operators, thus should ideally be added toBaselineMetricsWhat changes are included in this PR?
output_batchesintoBaselineMetricswithDEVMetricTyperecord_poll()APIoutput_bytesto baseline metrics #18268assert_metricsmacro to take multiple metrics strings for substring checkoutput_bytesandoutput_batchestracking inTopKoperatorbaselinemetrics forRepartitionExecAre these changes tested?
Added UT
Are there any user-facing changes?
Changes in the
EXPLAIN ANALYZEoutput,output_batcheswill be added tometrics=[...]