Support `chart` command in PPL #4579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

yuancu wants to merge 26 commits into opensearch-project:main from yuancu:issues/399

Collaborator

yuancu commented Oct 16, 2025 •

edited

Loading

Description

The chart command returns an aggregation result in a two-dimension table format.

Work items:

Related Issues

Resolves #399

Implementation Walk-through

Ideally, chart should pivot the result into a 2-dimension table. E.g. for the following table:

a	b	val
m	x	3
m	y	4

| chart avg(val) by a, b should make it a table like this:

a	x	y
m	3	4

However, it seems dynamic pivoting is not supported in SQL/Calcite (see original discussion in #3965 (comment)). Therefore, the result table for the implementedchart is like:

a	b	avg(val)
m	x	3
m	y	4

The pivoting can be performed in the front-end.

The above operation is equivalent to stats avg(val) by a, b -- this is the case when parameters like usenull, useother, and limit is not involved in the result.

When these parameters are involved, chart command will find the top-N categories of b, aggregating the rest to an OTHER category, and aggregating those whose b is null to a "NULL" category. This leads to the following implementation:

normal aggregation based on a, b (equivalent to stats agg_func by a, b)
find out the top-N categories (unique values of column b) by aggregating on the above aggregation results
1. aggregate on b
2. sort on aggregation results
3. number the rows
left join the ranked results with the original aggregation
keep rows whose row number is no greater than the limit, categorizing the rest to OTHER or NULL
Aggregate again because values categorized into OTHER or NULL need to be merged

Note:

This implementation did not reuse the implementation of timechart to circumvent some existing bugs. A following PR will merge their implementation as chart essentially is a superset of timechart in terms of functionality.

Future work items

support multiple aggregation functions (Left as a TODO in the future: the output will be messy when multiple aggregations are involved because the results are not pivoted.)
unify implementation of timechart and chart
support more bin options like bins (after Fix bins on time-related fields #4612 )

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

yuancu added the feature label

yuancu force-pushed the issues/399 branch 2 times, most recently from 8297023 to 6b8934e Compare

October 24, 2025 06:12

yuancu marked this pull request as ready for review

October 24, 2025 08:56

yuancu requested review from GumpacG, LantaoJin, MaxKsyunz, RyanL1997, Swiddis, YANG-DB, Yury-Fridlyand, acarbonetto, anirudha, dai-chen, derek-ho, forestmvey, joshuali925, kavithacm, mengweieric, noCharger, penghuo, ps48, qianheng-aws, seankao-az, vamsimanohar and ykmr1224 as code owners

October 24, 2025 08:56

yuancu marked this pull request as draft

October 28, 2025 14:38

yuancu marked this pull request as ready for review

October 29, 2025 01:58

yuancu changed the title ~~WIP: Support chart command in PPL~~ Support chart command in PPL

yuancu force-pushed the issues/399 branch from e0c92e1 to 692cbc0 Compare

October 29, 2025 07:52

yuancu force-pushed the issues/399 branch from db25ef7 to 86b4cb3 Compare

October 29, 2025 12:04

yuancu added 23 commits

October 30, 2025 10:26


          WIP: Make poc implementation for chart command

9a41cb4

Signed-off-by: Yuanchun Shen <[email protected]>


          Support param useother and otherstr

Signed-off-by: Yuanchun Shen <[email protected]>


          Support usenull and nullstr (when both row split and col split present)

851f536

Signed-off-by: Yuanchun Shen <[email protected]>


          Append a final aggregation to merge OTHER categories

70d4722

Signed-off-by: Yuanchun Shen <[email protected]>


          Handle common agg functions for OTHER category for timechart

9253e67

Signed-off-by: Yuanchun Shen <[email protected]>

# Conflicts:
#	core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java
#	integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java


          Fix timechart IT

19af837

Signed-off-by: Yuanchun Shen <[email protected]>


          Sort earliest results with asc order

6c5df2e

Signed-off-by: Yuanchun Shen <[email protected]>


          Support non-string fields as column split

b07c3c4

Signed-off-by: Yuanchun Shen <[email protected]>


          Fix min/earliest order & fix non-accumulative agg for chart

269d4e2

Signed-off-by: Yuanchun Shen <[email protected]>


          Hint non-null in aggregateWithTrimming

cf4c9de

Signed-off-by: Yuanchun Shen <[email protected]>


          Add integration tests for chart command

9b7a891

Signed-off-by: Yuanchun Shen <[email protected]>


          Add unit tests

Signed-off-by: Yuanchun Shen <[email protected]>


          Add doc for chart command

3c4c13a

Signed-off-by: Yuanchun Shen <[email protected]>


          Prompt users that multiple agg is not supported

5de82dd

Signed-off-by: Yuanchun Shen <[email protected]>


          Add explain ITs

d3858cc

Signed-off-by: Yuanchun Shen <[email protected]>


          Remove unimplemented support for multiple aggregations in chart command

8f4d6d4

Signed-off-by: Yuanchun Shen <[email protected]>


          Add unit tests for chart command

b14a764

Signed-off-by: Yuanchun Shen <[email protected]>


          Remove irrelevant yaml test

7126c82

Signed-off-by: Yuanchun Shen <[email protected]>


          Tweak chart.rst

7d294c7

Signed-off-by: Yuanchun Shen <[email protected]>


          Swap the order of chart output to ensure metrics come last

9bfb577

Signed-off-by: Yuanchun Shen <[email protected]>


          Filter rows without col split when calculate grand total

2c8d632

Signed-off-by: Yuanchun Shen <[email protected]>


          Chores: tweak code order

1fe81b3

Signed-off-by: Yuanchun Shen <[email protected]>


          Add anonymize test to chart command

d7949ef

Signed-off-by: Yuanchun Shen <[email protected]>

yuancu force-pushed the issues/399 branch from be9063f to d7949ef Compare

October 30, 2025 02:28

yuancu commented

View reviewed changes

Collaborator Author

yuancu left a comment

Code explanation.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

    
                  // Convert the column split to string if necessary: column split was supposed to be pivoted to

                  // column names. This guarantees that its type compatibility with useother and usenull

                  RexNode colSplit = relBuilder.field(1);

Collaborator Author

yuancu Oct 30, 2025

The fields are [row-split, col-split, aggregation] now

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Comment on lines +2085 to +2091

    
                  if (!SqlTypeUtil.isCharacter(colSplit.getType())) {

                    colSplit =

                        relBuilder.alias(

                            context.rexBuilder.makeCast(

                                UserDefinedFunctionUtils.NULLABLE_STRING, colSplit, true, true),

                            columSplitName);

                  }

Collaborator Author

yuancu Oct 30, 2025

Convert the column split to string so that they can be labels of columns once pivoted. This also guarantees that its type is compatible with nullstr and otherstr.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

    
                  // 1: column-split, 2: agg

                  relBuilder.project(relBuilder.field(1), relBuilder.field(2));

                  // Make sure that rows who don't have a column split not interfere grand total calculation

                  relBuilder.filter(relBuilder.isNotNull(relBuilder.field(0)));

Collaborator Author

yuancu Oct 30, 2025

testChartWithNullAndLimit covers this case. Without this line, it will number rows who don't have a column split if their aggregation result is great.

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Comment on lines +2105 to +2110

    
                  // Apply sorting: for MIN/EARLIEST, reverse the top/bottom logic

                  boolean smallestFirst =

                      aggFunction == BuiltinFunctionName.MIN || aggFunction == BuiltinFunctionName.EARLIEST;

                  if (config.top != smallestFirst) {

                    grandTotal = relBuilder.desc(grandTotal);

                  }

Collaborator Author

yuancu Oct 30, 2025

See explanations in #4594

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Comment on lines +2127 to +2132

    
                  relBuilder.push(aggregated);

                  relBuilder.push(ranked);

                  // on column-split = group key

                  relBuilder.join(

                      JoinRelType.LEFT, relBuilder.equals(relBuilder.field(2, 0, 1), relBuilder.field(2, 1, 0)));

Collaborator Author

yuancu Oct 30, 2025

aggregated: [row-split, col-split, aggregation]
ranked: [col-split, grand-total, row-number]

core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java

Comment on lines +2169 to +2171

    
                  relBuilder.aggregate(

                      relBuilder.groupKey(relBuilder.field(0), relBuilder.field(1)),

                      buildAggCall(context.relBuilder, aggFunction, relBuilder.field(2)).as(aggFieldName));

Collaborator Author

yuancu Oct 30, 2025

Final aggregation: to merge values in the OTHER categories.

penghuo requested changes

View reviewed changes

docs/user/ppl/cmd/chart.rst Outdated

    
              * **limit**: optional. Specifies the number of distinct values to display when using column split.

                * Default: 10

                * Syntax: ``limit=(top|bottom) <number>`` or ``limit=<number>`` (defaults to top)

Collaborator

penghuo Oct 30, 2025

Explain definition of top in doc, is it stats dc by col | sort -dc, +col?

Collaborator Author

yuancu Oct 31, 2025

It's to keep the top-K categories (distinct column splits).

E.g.

chart limit=1 count() by a b keeps the top 1 b categories with most rows
chart limit=bottom3 sum(value) by a b keeps the one b categories with minimum sum of values in its category.
chart limit=top2 min(value) by a b keeps 2 b categories whose minimum value within its group are the smallest 2.
chart limit=bottom2 min(value) by a b keep 2 b categories whose minimum value within its group are the largest 2.

docs/user/ppl/cmd/chart.rst Outdated

    
                * Set to 0 to show all distinct values without any limit.

                * Only applies when column split presents (by 2 fields or over...by... coexists).

              * **useother**: optional. Controls whether to create an "OTHER" category for distinct column values beyond the limit.

Collaborator

penghuo Oct 30, 2025

column -> column_split

Collaborator Author

yuancu Oct 31, 2025

Fixed

docs/user/ppl/cmd/chart.rst Outdated

    
                * When set to true, distinct values beyond the limit are grouped into an "OTHER" category.

                * Only applies when using column split and when there are more distinct column values than the limit.

              * **usenull**: optional. Controls whether to include null values as a separate category.

Collaborator

penghuo Oct 30, 2025

change doc, make it clearly.

usenull=true only applie to column_split
row_split should always be non-null value.

Collaborator Author

yuancu Oct 31, 2025 •

edited

Loading

Fixed. row_split can actually contain null; it will be handled in the same manner as normal aggregations like stats count() by a, b where there exists null values in column a.

docs/user/ppl/cmd/chart.rst Outdated

    
              Notes

              =====

              * The column split field in the result will become strings so that they are compatible with ``nullstr`` and ``otherstr`` and can be used as column names once pivoted.

Collaborator

penghuo Oct 30, 2025

The column split field in the result will become strings ->
The fields generated by column splitting are converted to strings

Collaborator Author

yuancu Oct 31, 2025

Fixed. Thanks for the suggestion!

docs/user/ppl/cmd/chart.rst

Comment on lines +144 to +152

    
                  os> source=accounts | chart limit=1 count() over gender by age

                  fetched rows / total rows = 3/3

                  +--------+-------+---------+

                  | gender | age   | count() |

                  |--------+-------+---------|

                  | M      | OTHER | 2       |

                  | M      | 33    | 1       |

                  | F      | OTHER | 1       |

                  +--------+-------+---------+

Collaborator

penghuo Oct 30, 2025

I expect result should another row?
F 33 0

then, pivot table will be
gender 33 OTHER
M,1,2
F,1,0

Collaborator Author

yuancu Oct 31, 2025 •

edited

Loading

Since we are not really doing pivoting, I think it's better to omit empty groups? This avoids a large sparse response and reduces traffic. Besides, other aggregations also don't return results for empty buckets.

timechart also claims to omit those buckets:

Only combinations with actual data are included in the results - empty combinations are omitted rather than showing null or zero values.

If front-end wants to add it back, they can easily fill null or 0 to those missing groups.

docs/user/ppl/cmd/chart.rst Outdated

    
              PPL query::

                  os> source=accounts | chart limit=top 1 useother=true otherstr='minor_gender' count() over state by gender

Collaborator

penghuo Oct 30, 2025

syntax should be limit=top10

Collaborator Author

yuancu Oct 31, 2025

Fixed. Thanks for double checking

docs/user/ppl/cmd/chart.rst

    
              PPL query::

                  os> source=accounts |  chart usenull=true nullstr='employer not specified' count() over firstname by employer

Collaborator

penghuo Oct 30, 2025

add an example to demo convert column_split to string.

Collaborator Author

yuancu Oct 31, 2025

It's actually covered by example 3 and 4. Updated their descriptions.

yuancu added 3 commits

October 31, 2025 11:10


          Merge remote-tracking branch 'origin/main' into issues/399

154cbc4

Signed-off-by: Yuanchun Shen <[email protected]>


          Change grammart from limit=top 10 to limit=top10

7bef202

Signed-off-by: Yuanchun Shen <[email protected]>


          Update chart doc

9da3bd2

Signed-off-by: Yuanchun Shen <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

penghuo penghuo requested changes

ps48 Awaiting requested review from ps48 ps48 is a code owner

kavithacm Awaiting requested review from kavithacm kavithacm is a code owner

derek-ho Awaiting requested review from derek-ho derek-ho is a code owner

joshuali925 Awaiting requested review from joshuali925 joshuali925 is a code owner

dai-chen Awaiting requested review from dai-chen dai-chen is a code owner

YANG-DB Awaiting requested review from YANG-DB YANG-DB is a code owner

mengweieric Awaiting requested review from mengweieric mengweieric is a code owner

vamsimanohar Awaiting requested review from vamsimanohar vamsimanohar is a code owner

Swiddis Awaiting requested review from Swiddis Swiddis is a code owner

seankao-az Awaiting requested review from seankao-az seankao-az is a code owner

MaxKsyunz Awaiting requested review from MaxKsyunz MaxKsyunz is a code owner

Yury-Fridlyand Awaiting requested review from Yury-Fridlyand Yury-Fridlyand is a code owner

anirudha Awaiting requested review from anirudha anirudha is a code owner

forestmvey Awaiting requested review from forestmvey forestmvey is a code owner

acarbonetto Awaiting requested review from acarbonetto acarbonetto is a code owner

GumpacG Awaiting requested review from GumpacG GumpacG is a code owner

ykmr1224 Awaiting requested review from ykmr1224 ykmr1224 is a code owner

LantaoJin Awaiting requested review from LantaoJin LantaoJin is a code owner

noCharger Awaiting requested review from noCharger noCharger is a code owner

qianheng-aws Awaiting requested review from qianheng-aws qianheng-aws is a code owner

RyanL1997 Awaiting requested review from RyanL1997 RyanL1997 is a code owner

Requested changes must be addressed to merge this pull request.

Labels