feat: `ClassicJoin` for PWMJ #17482

jonathanc-n · 2025-09-09T04:02:34Z

Which issue does this PR close?

part of [EPIC]: Make PiecewiseMergeJoin work in Datafusion #17427

Rationale for this change

Adds regular joins (left, right, full, inner) for PWMJ as they behave differently in the code path.

What changes are included in this PR?

Adds classic join + physical planner

Are these changes tested?

Yes SLT tests + unit tests

Follow up work to this pull request

Handling partitioned queries and multiple record batches (fuzz testing will be handled with this)
Simplify physical planning
Add more unit tests for different types (another pr as the LOC in this pr is getting a little daunting)

next would be to implement the existence joins

jonathanc-n · 2025-09-09T04:04:33Z

@2010YOUY01 Would you like to take a look at if this is how you wanted to split up the work? I just wanted to put this out today then i'll clean it up better this week. Only failing one external test currently.

jonathanc-n · 2025-09-09T04:05:57Z

datafusion/core/src/physical_planner.rs

                let join: Arc<dyn ExecutionPlan> = if join_on.is_empty() {
                    if join_filter.is_none() && matches!(join_type, JoinType::Inner) {
                        // cross join if there is no join conditions and no join filter set
                        Arc::new(CrossJoinExec::new(physical_left, physical_right))
+                    } else if num_range_filters == 1


I would like to refactor this in another pull request, just a refactor but it should be quite simple to do. Just wanted to get this version in first.

jonathanc-n · 2025-09-09T04:07:10Z

datafusion/sqllogictest/test_files/joins.slt

+statement ok
+set datafusion.execution.batch_size = 8192;
+
+# TODO: partitioned PWMJ execution


Currently doesn't allow partitioned execution, this would make reviewing the tests a little messy as many of the partitioned single range queries would switch to PWMJ. Another follow up, will be tracked in #17427

…jonathanc-n/datafusion into classic-join-physical-planner

jonathanc-n · 2025-09-09T18:00:17Z

cc @2010YOUY01 @comphead this pr is now ready!

2010YOUY01 · 2025-09-11T11:30:30Z

This is great! I have some suggestions for the planning part, and I'll review the execution part tomorrow.

Refactor the in-equality extracting logic

I suggest to move the inequality-extracting logic from physical_planner.rs into https://github.com/apache/datafusion/blob/main/datafusion/optimizer/src/extract_equijoin_predicate.rs

The reason is we'd better put similar code into a single place, instead of let it scatter to multiple places. ExtractEquijoinPredicate logical optimizer rule is extracting equality join predicates like t1.v1 = t2.v1, here we want to extract t1.v1 < t2.v1, their logic should be very similar.

To do this I think we need to extend the logical plan join node with extra ie predicate field (maybe we can define a new struct for IE predicate with (Expr, Op, Expr), and we can also use that in other places)

/// Join two logical plans on one or more join columns
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
pub struct Join {
    ...
    /// Equijoin clause expressed as pairs of (left, right) join expressions
    pub on: Vec<(Expr, Expr)>,                                                                 
    /// In-equility clause expressed as pairs of (left, right) join expressions           <-- HERE
    pub ie_predicates: Vec<(Expr, IEOp, Expr)>,
    /// Filters applied during join (non-equi conditions)
    pub filter: Option<Expr>,
    ...
}

To make it compatible for systems only use the LogicalPlan API, but not the physical plans, we can also provide a utility to move the IE predicates back to the filter:

Before: 
ie_predicates: [t1.v1 < t2.v1, t1.v2 < t2.v2]
filter: (t1.v3 + t2.v3) = 100

After:
ie_predicates: []
filter: ((t1.v3 + t2.v3) = 100) AND (t1.v1 < t2.v1) AND (t1.v2 < t2.v2)

Perhaps we can open a PR only for this IE predicates extracting task, and during the initial planning we can simply move the IE predicates back to the filter with the above mentioned utility.

Make it configurable to turn on/off PWMJ

I'll try to finish #17467 soon to make it easier, so let's put this on hold for now.

comphead · 2025-09-11T19:15:08Z

Thanks @jonathanc-n and @2010YOUY01

#17467 definitely would be nice to have as PWMJ can start as optional experimental join, which would be separately documented, showing benefits and limitations for the end user. Actually the same happened for SMJ being experimental feature for quite some time.

Another great point to identify bottlenecks in performance is to absorb some knowledge from #17488 and keep the join more stable.

As optional feature it is pretty safe to go, again referring to SMJ there was a separate ticket which post launch checks to make sure it is safe to use like #9846

Let me know your thoughts?

jonathanc-n · 2025-09-11T22:36:06Z

Yes I think the experimental flag should be added first and we can do the equality extraction logic as a follow up. WDYT @2010YOUY01 Do you think you want to get #17467 before this one?

2010YOUY01 · 2025-09-13T04:18:15Z

Yes I think the experimental flag should be added first and we can do the equality extraction logic as a follow up. WDYT @2010YOUY01 Do you think you want to get #17467 before this one?

Yes, so let's do other work first. If I can't get #17467 done when this PR is ready, let's add enable_piecewise_merge_join option here -- I think we can agree on this configuration.

2010YOUY01

I have gone over the exec.rs, and will continue with the stream implementation part soon.

2010YOUY01 · 2025-09-13T04:20:22Z

datafusion/physical-plan/src/joins/piecewise_merge_join/exec.rs

+    ExecutionPlan, PlanProperties,
+};
+use crate::{DisplayAs, DisplayFormatType, ExecutionPlanProperties};
+


This is one of the best module comments I have seen.

datafusion/physical-plan/src/joins/piecewise_merge_join/exec.rs

2010YOUY01 · 2025-09-13T04:57:33Z

datafusion/physical-plan/src/joins/piecewise_merge_join/exec.rs

+        } else {
+            // Sort the right side in memory, so we do not need to enforce any sorting
+            vec![
+                Some(OrderingRequirements::from(self.left_sort_exprs.clone())),


A question here for future clean-up: now we're storing the required input ordering property inside the executor, is it possible to move them into PlanProperties struct?

If I recall correctly I don't believe PlanProperties enforces input ordering? Plan properties only enforces output

jonathanc-n · 2025-09-14T23:50:14Z

@2010YOUY01 I have added the requested changes! Should be good for another go.

jonathanc-n · 2025-09-14T23:53:13Z

@comphead Should a flag be added to let this be optional, like alllow_pwmj_execution or something along those lines?

2010YOUY01

I took a quick look through the classic_join.rs, the general structure looks great. I left some major issues I'd like to tackle first.

The goal now is to ensure it's significantly faster than NLJ, I ran some micro-bench and found it's slower, so I'd like to better understand its implementation and make it faster.

> set datafusion.execution.target_partitions = 1;
0 row(s) fetched.
Elapsed 0.001 seconds.
> SELECT *
        FROM range(30000) AS t1
        INNER JOIN range(30000) AS t2
        ON (t1.value > t2.value);
...
885262824 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 0.840 seconds.
> SELECT *
        FROM range(30000) AS t1
        FULL JOIN range(30000) AS t2
        ON (t1.value > t2.value);
...

885262825 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 1.592 seconds.

They're Q11 and Q12 from https://github.com/apache/datafusion/blob/main/benchmarks/src/nlj.rs
Using NLJ they're both around 0.55s, also the results don't match.

The remaining part for me to review:

minor issues in classic_join.rs
test coverage

2010YOUY01 · 2025-09-15T11:02:39Z

datafusion/physical-plan/src/joins/piecewise_merge_join/classic_join.rs

+    }
+}
+
+// Holds all information for processing incremental output


Could you add more doc for how this struct work? Maybe with a walkthrough on simple examples.

I took a quick glance, it seem possible to cut big output, and output one by one according to batch_size. However it does not support combining/coalescing small batches to batch_size?

Is it possible to do the combining to create batch_size in a follow up. I think I am going to try to just get the right behaviour in this pull request + simplify the logic (this one is much needed right now) 😆

I think it is fine to leave it to a follow up. Just a heads up, when working on the performance issue, if there is any downstream operator for PWMJ, the small batch might harm the performance.

datafusion/physical-plan/src/joins/piecewise_merge_join/classic_join.rs

2010YOUY01 · 2025-09-15T11:17:39Z

datafusion/physical-plan/src/joins/piecewise_merge_join/classic_join.rs

+
+// For Left, Right, Full, and Inner joins, incoming stream batches will already be sorted.
+#[allow(clippy::too_many_arguments)]
+fn resolve_classic_join(


I found the implementation of this function is quite hard to understand, is it possible to structure this way:

// Materialize the result when possible if batch_process_state.has_ready_batch() { return Ok(batch_process_state.finish()); } // Else advancing the stream/buffer side index, and put the matched indices into `batch_process_state` for it to materialize incrementally later // ...

jonathanc-n · 2025-09-16T02:40:40Z

I'll try to complete all the refactoring tomorrow. The performance may be due to the sides that are being used, I will need to take a look into that.

The results don't match because it currently doesnt allow for execution of more than record batch

jonathanc-n · 2025-09-18T00:45:08Z

The performance saw a similar hit in #16660 (the benchmark is in the description). I think I can tune when to use this join based on the incoming size in a follow up, for now the config will restrain this join to keep it purely experimental

jonathanc-n · 2025-09-18T01:33:03Z

How did you get the incorrect result? I'm testing query 12 and it doesnt optimize into a piecewisemergejoin

2010YOUY01 · 2025-09-18T04:26:51Z

The performance saw a similar hit in #16660 (the benchmark is in the description). I think I can tune when to use this join based on the incoming size in a follow up, for now the config will restrain this join to keep it purely experimental

That bench doesn't include sort time, PWMJ should be faster than NLJ even it includes the sorting overhead (n*log(n) v.s. n^2).

I think the main motivation for adding this executor is its performance advantage, so we probably shouldn’t merge an initial PR without first getting it to a good performance level. (Also, since there aren’t many merge conflicts to resolve for this PR, I don’t think there’s any rush.)

I can help diagnose it later.

How did you get the incorrect result? I'm testing query 12 and it doesnt optimize into a piecewisemergejoin

set datafusion.execution.target_partitions = 1; This config should get PWMJ triggered

Moreover we should get sqlite extended test passed later, either through configuring target_partition to 1 or enable parallel execution for it (BTW why are't we support large target_partitions now? it seen not requiring lots of change, only the stream side have to be round-robin repartitioned)

POC: ClassicJoin for PWMJ

d87d24d

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Sep 9, 2025

Merge branch 'main' into classic-join-+-physical-planner

eb80117

jonathanc-n marked this pull request as draft September 9, 2025 04:03

jonathanc-n commented Sep 9, 2025

View reviewed changes

jonathanc-n mentioned this pull request Sep 9, 2025

[EPIC]: Make PiecewiseMergeJoin work in Datafusion #17427

Open

jonathanc-n added 8 commits September 9, 2025 00:30

fmt

f343f71

clippy + fix test

248ae49

fix tests

1020e65

fmt

29c0ff0

Merge branch 'main' into classic-join-+-physical-planner

59486ab

clean up slt tests

cb94a20

Merge branch 'classic-join-+-physical-planner' of https://github.com/…

13db5b5

…jonathanc-n/datafusion into classic-join-physical-planner

Merge branch 'main' into classic-join-+-physical-planner

2675ad8

jonathanc-n marked this pull request as ready for review September 9, 2025 17:59

jonathanc-n changed the title ~~POC: ClassicJoin for PWMJ~~ feat: ClassicJoin for PWMJ Sep 9, 2025

Merge branch 'main' into classic-join-+-physical-planner

770e1a8

2010YOUY01 reviewed Sep 13, 2025

View reviewed changes

jonathanc-n added 2 commits September 14, 2025 19:29

Merge branch 'main' into classic-join-+-physical-planner

0039a2f

fixes

18ee4cb

github-actions bot added the common Related to common crate label Sep 14, 2025

jonathanc-n added 2 commits September 14, 2025 22:58

remove swap

e3d8606

Merge branch 'main' into classic-join-+-physical-planner

9b74c9d

github-actions bot removed the common Related to common crate label Sep 15, 2025

2010YOUY01 reviewed Sep 15, 2025

View reviewed changes

change varialbe names

0834b98

feat: ClassicJoin for PWMJ #17482

Are you sure you want to change the base?

feat: ClassicJoin for PWMJ #17482

Conversation

jonathanc-n commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Follow up work to this pull request

Uh oh!

jonathanc-n commented Sep 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanc-n commented Sep 9, 2025

Uh oh!

2010YOUY01 commented Sep 11, 2025

Refactor the in-equality extracting logic

Make it configurable to turn on/off PWMJ

Uh oh!

comphead commented Sep 11, 2025

Uh oh!

jonathanc-n commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

2010YOUY01 commented Sep 13, 2025

Uh oh!

2010YOUY01 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanc-n commented Sep 14, 2025

Uh oh!

jonathanc-n commented Sep 14, 2025

Uh oh!

2010YOUY01 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanc-n commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jonathanc-n commented Sep 18, 2025

Uh oh!

jonathanc-n commented Sep 18, 2025

Uh oh!

2010YOUY01 commented Sep 18, 2025

Uh oh!

Uh oh!

feat: `ClassicJoin` for PWMJ #17482

feat: `ClassicJoin` for PWMJ #17482

jonathanc-n commented Sep 9, 2025 •

edited

Loading

jonathanc-n commented Sep 11, 2025 •

edited

Loading

2010YOUY01 left a comment •

edited

Loading

jonathanc-n commented Sep 16, 2025 •

edited

Loading