Implementation of DP join enumeration algorithm based on hypergraphs for PostgreSQL.
Original paper "Dynamic Programming Strikes Back".
This is an extension for PostgreSQL that implements dynamic programming algorithm based on hypergraphs to find join ordering (DPhyp).
In PostgreSQL there are 2 builtin algorithms:
- Conventional dynamic programming (called DPsize)
- GEQO, randomized
DPsize can find suitable join ordering, but it's performance dramatically degrades starting from 12 tables.
To prevent infinite awaiting GEQO is used - it is randomized algorithm that is enabled by geqo
GUC and starts working when tables count hits geqo_threshold
.
DPhyp algorithm can solve such problem - for some queries it can find close to optimal plan in reasonable amount of time.
Idea of algorithm is simple: instead of trying to join relations with each other (DPsize) we find links between tables and guide search process using them (DPhyp). This helps us reduce search space and do not consider obviously impossible JOINs.
NOTE: found join ordering can be not optimal - DPsize's plan can be more optimal.
Building is the same as for other extensions: clone to contrib
or use PGXS
and run make install
.
Installation to database requires only adding extension to shared_preload_libraries
:
shared_preload_libraries = 'pg_dphyp'
There are 5 GUC settings:
-
pg_dphyp.enabled
(boolean) - controls whether algorithm is enabled. It ison
by default -
pg_dphyp.cj_strategy
(enum) - controls behavior of algorithm related to presence of cross joins:no
- no actions are taken, if algorithm failed to build final JOIN relation, then DPsize/GEQO is calledpass
- if algorithm failed to build JOIN relation, then it tries to find all built JOIN relations groups (disjoint relation sets) and pass them to DPsize/GEQO (thus we do not throw away done work)detect
- try to find all disjoint relation sets before algorithm is run and create hyperedges for them. If algorithm still fails to create final JOIN relation, then DPsize/GEQO is called
-
pg_dphyp.min_relations
(int) - minimal number of relations passed to single join search run after which DPhyp will be used. Default is0
, so it's always enabled. -
pg_dphyp.max_relations
(int) - maximal number (including) of relations passed to single join search run after which GEQO will be used, so acts likegeqo_threshold
. Default is16
. -
pg_dphyp.count_cc
(boolean) - count number of connected subgraphs and use this value to preallocate hash table. This can give some performance improvements. -
pg_dphyp.geqo_cc_threshold
(int) - maximal number of connected subgraphs after which GEQO will be run. If value is0
, then it's off and no checks are performed, otherwise number of connected subgraphs will be counted and used without taking into accountcount_cc
GUC. Default is10000
.
I assume that you are familiar with DPhyp algorithm, therefore there will be no explanation of it's work. Instead this section describes key ideas.
PostgreSQL has Bitmapset
structure that allows us to work with bitmap sets of any size.
But 1) it requires memory allocation each time it is created or modified and 2) in most cases amount of relations in JOIN less than 64.
So for performance reason all operations are performed with bitmapword
.
To iterate over subsets bit manipulation is used.
Formula (-x) & x
allows us to iteratively add 1 to existing subset value, so we iterate over incrementing numbers.
i.e. we have neighborhood 010110
, so we will have next subsets:
000010
000100
000110
010000
010010
010100
010110
Before DPhyp is run hypergraph must be created and more specifically - hyperedges.
To built hyperedges predicates are used.
List of RelOptInfo
is passed as input for join_search_hook
and it has already processed list of RestrictInfo
that we can use.
Edges are created from predicates found in:
RelOptInfo->joinclauses
- generic expressions involving multiple relationsPlannerInfo->eclasses
- if there are JOIN clauses involving equalityPlannerInfo->join_info_list
- for non-INNER join clauses
But note, that these cases do not cover CROSS JOINs - use pg_dphyp.cj_strategy
GUC for that (assume that cross joins are not frequent in real workload).
HyperEdges are split into 2 kinds - simple and complex. Simple hyperedges contains hypernodes with single element, otherwise it is complex.
As an optimization simple edges stored as simple bitmapword
per relation: in DPHypContext->simple_edges
array indexed by number of base node and in HyperNode->simple_edges
- precomputed simple neighborhood for hypernode.
Some join clauses can be complex and contain many different relations which has no explicit join clause between them. For example:
t1.x + t2.x = t3.x + t4.x
We have hyperedge {t1, t2} - {t3, t4}
. If there are no join clauses t1 - t2
and t3 - t4
, then there is not edge between them in graph.
Such hypernodes in hyperedges called here cross join sets (cjs for short). All relations in CJS must be 'cross joined' with each other, that is simple hyperedge is created.
This can add complexity and slow down performance, but otherwise we will not be able to create JOIN relation.
Computing neighborhood of some hypernode is core of the algorithm. According to MySQL comments:
This function accounts for roughly 20–70% of the total DPhyp running time
So optimization of this function is key to performance.
There are some observations. First, we can compute neighborhood for 1 part of set and then just add some delta for other part.
This allows us to compute neighborhood effectively when some part already known.
In the implementation it is implemented in get_neighbors_delta
function which accepts some base neighborhood and delta (which must be computed).
Another observation taken from the way we iterate over neighborhood subsets. Notice, that iteration is just incrementing and when we increment MSB changes much less frequently than LSB, so we can cache that MSB part and calculate only LSB.
The third observation is that to build neighborhood for 1 subset we can use some other subset and just add 1 relation to it, i.e. to build 01010
we can use 00010
or 01000
.
Let's build table of subsets for 4 relations. Arrows will show how we can use previous build subsets to calculate current:
0000 <+ <+ <+
^ | | |
0001 | | |
| | |
0010 -+ | |
^ | |
0011 | |
| |
0100 <+ -+ |
^ | |
0101 | |
| |
0110 -+ |
^ |
0111 |
|
1000 <+ <+ -+
^ | |
1001 | |
| |
1010 -+ |
^ |
1011 |
|
1100 <+ -+
^ |
1101 |
|
1110 -+
^
1111
If you look more closely, you will notice this pattern - to calculate current neighborhood we must:
- Unset first
1
in current subset (i.e.0101
->0100
) - Take neighborhood of that subset
- Process that first
1
with saved 'base' neighborhood
And lastly, note that step 2 is literally number of leading zeros in that number and number of zeros is the number of accesses to it's neighborhood.
But note, that subsets are sparse bitmaps, so instead we use number of current iteration - it serves same purposes, but allows comfortable calculation (just represents bitmask of current subset).
So this gives us such caching scheme:
-
Create array of
bitmapword
(sets) of max relation count (in case of PostgreSQL it isBITS_PER_BITMAPWORD
) - this is our cache. -
Each time we get new subset remove lowest bit from iteration number, calculate number of zeros and take element from cache at given index - this is base neighborhood
- If resulting subset is
0
(i.e. when current is0010
), then take0
(because when iterating such way old neighborhood is inexcluded
set)
- If resulting subset is
-
Calculate neighborhood for that lowest bit with 'base' neighborhood
-
Calculate number of leading zeros in current iteration number and store resulting neighborhood in cache (at that index)
As you can see this allows us to process single relation at a time and no recalculation required.
During calculation of neighborhood we have some moving parts, but other parts are fixed or can grow.
For this reason array of complex edges is indexed - sorted by number of leading zeros in 'right' part of edge. Example array:
[
xxx - 0010
xxx - 1010
xxx - 0100
xxx - 1100
xxx - 1000
]
Index is an array where element at i
points to index in complex_edges
array which has all relations greater or equal to i
.
This index is called start_index
.
For example array we have such index:
[0, 0, 0, 2, 4]
We iterate over complex edges in 2 scenarios:
- Compute neighborhood
- Decide whether 2 hypernodes are connected
When computing neighborhood excluded
set is fixed and we known for sure that leading part of it will all be 1
.
So we compute size of this part and start iterating from index where all relations in 'right' part greater or equal to that number.
For example, if excluded
is 0011
, then index obtained from start_index
is 2
.
Note that in PostgreSQL we have bmw_rightmost_one_pos
, but not bmw_rightmost_zero_pos
, so we use +-1
trick: add 1 to excluded
set, calculate rightmost 1 and remove 1 from result value - this gives us number of leading 1.
For second case we don't have excluded
set, but instead right hypernode is fixed (for which we test for connection).
Idea is the same - 'right' edge will definitely not be subset if it has relations less than minimal from that tested hypernode.
For example, if right
hypernode is 1000
, then index obtained from start_index
is 4
.
Paper "Adaptive Optimization of Very Large Join Queries" gives good heuristic estimation for DPhyp applicability - decision should be based on graph complexity, not on amount of tables.
This paper also gives function to calculate amount of connected subgraphs: count_cc
(entrypoint) and count_cc_rec
(recursive calculation).
It can be used not only to check whether DPhyp running is pragmatic, but also gives amount of total subgraphs in dptable, so we can also use it.
In extension this is controlled by 2 GUC pg_dphyp.count_cc
(enable/disable) and pg_dphyp.geqo_cc_threshold
(threshold after which GEQO is run).
There is another observation - we can effectively count this amount, just because we found way before in optimal neighborhood subset iteration. So calculation of this function is also effective.
This algorithm will not find final plan in case of disjoint sets.
They can arise when CROSS JOIN
/,
is used without any clauses that connects them.
Also, there is case for outer parameters, because outer parameters are not handled as relations. For example such subquery has 2 disjoint relations due to outer parameter:
SELECT * FROM t1
WHERE t1.x IN (SELECT t2.x FROM t2, t3 WHERE t2.x > t1.x AND t3.x <> t1.x);
pg_dphyp
compatible with PostgreSQL down to 12 major version including.
Actually, there is no testing. Regress tests in sql
directory required to find differences in result plan after code changes.
Single file sql/join.out
is a compilation of several regression test files located in src/tests/regress/sql
- some for environment setup and other are tests: join.sql
, join_hash.sql
and partition_join.sql
.
These tests taken for 17 PostgreSQL - for other version tests fail due to differences in result plans and/or unsupported env variables to load data in (from data
directory).`