Skip to content

[Feature Request] Make use of dynamic pruning for faster cardinality aggregations #11959

@rishabhmaurya

Description

@rishabhmaurya

Is your feature request related to a problem? Please describe

Dynamic pruning algorithms work by dynamically adding negating filters into the query as disjunctions which are non-competitive( or found to be no more competitive while query execution) to prune the search space.
One of the utility is cardinality aggregations. Instead of using negative filter in lucene, if the field is a low cardinality field (to avoid explosion of disjunctions), then the query can be rewritten as a disjunction query of all the unique terms of the field. As the matching documents are evaluated, if the field value is a unique encountered so far, then it can be safely removed from the disjunction query. This ensures that the documents aren't processed twice for a unique value of a field on which cardinality aggregation is run. Also, the query will be early terminated if all disjunctive filters are exhausted.

Describe the solution you'd like

This logic can easily be embedded into Query.rewrite() method when the cardinality aggregation is the only aggregation and field is low cardinality field. Field cardinality upper bound for a query can be estimated either from FieldReader size() or SortedSetDocValues getValueCount().

We need run some benchmarks in order to know when to start enabling this optimization as it may not be very helpful in smaller corpus. I propose running it against noaa workload - https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/bdbd4bbd74fbf319398de1ca169f16744821bcde/noaa/operations/default.json#L765

Related component

Search:Aggregations

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Search:AggregationsenhancementEnhancement or improvement to existing feature or requestv2.15.0Issues and PRs related to version 2.15.0

Type

No type

Projects

Status

✅ Done

Status

Done

Status

Planned work items

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions