Skip to content

Conversation

@crepererum
Copy link

@crepererum crepererum commented Sep 5, 2025

Tracking issue: https://github.com/influxdata/influxdb_iox/issues/14924

Patches

Patches map to commits 1:1 (i.e. every patch is exactly 1 commit) and are ordered for easier correlation of the description and the respective commits. They are also grouped in 3 stages.

A: Dummy

No actual patches, can be dropped at any point:

  1. a dummy patch just to get "a diff" to the base branch

B: CI Fixes

Need to get CI up and running before picking any actual patches:

  1. chore(deps): bump tracing-subscriber from 0.3.19 to 0.3.20:
    That's chore(deps): bump tracing-subscriber from 0.3.19 to 0.3.20 apache/datafusion#17355 . Without it, the "security audit" CI check fails because it finds an issue in the Cargo.lock file used for the examples. This wouldn't affect us, but it's easier to reason about a green CI. Can be dropped with DF 50.

All commits afterwards should build cleanly!

C: Patches

These are the actual relevant patches:

  1. chore: default=true for skip_physical_aggregate_schema_check, and add warn logging:
    until we chase down all warnings in our iox logs (see https://github.com/influxdata/influxdb_iox/issues/12404 )
  2. fix: temporary fix to handle incorrect coalesce (inserted during EnforceDistribution) which later causes an error during EnforceSort (without our patch). The next DataFusion version 46 upgrade does the proper fix, which is to not insert the coalesce in the first place.:
    There is EAR-5822 (also see https://github.com/influxdata/influxdb_iox/issues/13310 ) despite what the note in Patched DataFusion version 45.0.0 #54 and ParallelizeSorts, a subrule of EnforceSorting optimizer, should not remove necessary coalesce. apache/datafusion#14691 (comment) say, this is still required for DF version 46. Otherwise the regression test fails. Also see this slack thread.
  3. fix(build-wasm): put arrow-ipc/zstd dep under compression feature flag:
    That's fix(build-wasm): put arrow-ipc/zstd dep under compression feature apache/datafusion#16844. I need this for https://github.com/influxdata/datafusion-udf-wasm . Can be dropped with DF 50.
  4. Support centroids config for approx_percentile_cont_with_weight:
    That's Support centroids config for approx_percentile_cont_with_weight apache/datafusion#17003 . Needed so that the next patch applies cleanly. Can be dropped with DF 50.
  5. (Re)Support old syntax for approx_percentile_cont and approx_percentile_cont_with_weight:
    That's (Re)Support old syntax for approx_percentile_cont and approx_percentile_cont_with_weight apache/datafusion#16999 . Can be dropped with DF 50.
  6. feat: support distinct for window:
    That's feat: support distinct for window apache/datafusion#16925 because a customer wants it (see https://github.com/influxdata/EAR/issues/6252 ). Can be dropped with DF 50.
  7. fix: return ALL constants in EquivalenceProperties::constants:
    That's fix: return ALL constants in EquivalenceProperties::constants apache/datafusion#17404 . This was a regression in DF 49 that tripped our query tests. Can be dropped with DF 50.
  8. feat: Support binary data types for SortMergeJoin on clause:
    That's feat: Support binary data types for SortMergeJoin on clause apache/datafusion#17431 backported. Can be dropped with DF 50.
  9. chore: skip order calculation / exponential planning:
    Even though we initially thought that this is no longer required with DF 49, it still is. Otherwise we run into https://github.com/influxdata/influxdb_iox/issues/13038 and the regression test end_to_end_cases::querier::influxrpc::read_filter::read_filter_many_tables will fail.
  10. (New) Test + workaround for SanityCheck plan:
    This is required because the previous patch (:top:) trips the sanity checker. Note though that the original commit b41808b once contained a test which no longer seem to trigger a sanity check error w/o the patch, so the test is kinda useless and was dropped.

dependabot bot and others added 7 commits September 5, 2025 13:28
)

Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.3.19 to 0.3.20.
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](tokio-rs/tracing@tracing-subscriber-0.3.19...tracing-subscriber-0.3.20)

---
updated-dependencies:
- dependency-name: tracing-subscriber
  dependency-version: 0.3.20
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…rceDistribution) which later causes an error during EnforceSort (without our patch). The next DataFusion version 46 upgrade does the proper fix, which is to not insert the coalesce in the first place.

test: recreating the iox plan:
* demonstrate the insertion of coalesce after the use of column estimates, and the removal of the test scenario's forcing of rr repartitioning

test: reproducer of SanityCheck failure after EnforceSorting removes the coalesce added in the EnforceDistribution

fix: special case to not remove the needed coalesce
…pache#17003)

* Support centroids config for `approx_percentile_cont_with_weight`

* Match two functions' signature

* Update docs

* Address comments and unify centroids config
…ntile_cont_with_weight` (apache#16999)

* Add sqllogictests

* Allow both new and old sytanx for approx_percentile_cont and approx_percentile_cont_with_weight

* Update docs

* Add documentation and more tests
* feat: support distinct for window

* fix

* fix

* fisx

* fix unparse

* fix test

* fix test

* easy way

* add test

* add comments
…he#17404)

* test: regression test for apache#17372

* test: add more direct regression for apache#17372

* fix: return ALL constants in `EquivalenceProperties::constants`
@alamb alamb changed the title Patched DF 49.0.2 (take 1) Patched DF 49.0.2 (take ) Sep 8, 2025
@alamb alamb changed the title Patched DF 49.0.2 (take ) Patched DF 49.0.2 (take 2) Sep 8, 2025
@alamb
Copy link
Collaborator

alamb commented Sep 8, 2025

I changed the title to "take 2" as I think this is the second attempt

…he#17431)

* feat: Support binary data types for `SortMergeJoin` `on` clause

* Add sql level tests for merge join on binary keys

---------

Co-authored-by: Andrew Lamb <[email protected]>
@crepererum crepererum changed the title Patched DF 49.0.2 (take 2) Patched DF 49.0.2 (take 1) Sep 9, 2025
@crepererum
Copy link
Author

I changed the title to "take 2" as I think this is the second attempt

No, it's take 1 for 49.0.2 (the earlier attempt was for 49.0.1).

@alamb
Copy link
Collaborator

alamb commented Sep 9, 2025

chore: skip order calculation / exponential planning:

I did some research, and I think this ticket describes our current issues:

I have put it into the backlog for consideration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants