Skip to content

Conversation

@ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Oct 21, 2025

Description


Summary

This PR extends the bin command to support time-related fields (timestamp, date, time) in non-aggregation scenarios using the WIDTH_BUCKET UDF.
Previously, bins on time fields only worked with aggregations via auto_date_histogram pushdown.
Now, users can apply binning on timestamp fields in projection queries without aggregation.

Related Issues

Resolves ##4578

ahkcs added 2 commits October 20, 2025 20:46
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
ahkcs added 3 commits October 20, 2025 21:50
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Comment on lines 33 to 34
RexNode maxValue = context.relBuilder.max(fieldExpr).over().toRex();
RexNode dataRange = context.relBuilder.call(SqlStdOperatorTable.MINUS, maxValue, minValue);
RexNode minValue = context.relBuilder.min(fieldExpr).over().toRex();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitnit: more intuitive to start with minValue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched order

}

/** Width bucket calculation using nice number algorithm. */
public static String calculateWidthBucket(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is cleaner to have separate method for timestamp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated

@@ -0,0 +1 @@
{"calcite":{"logical":"LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], email=[$8], lastname=[$9], age=[$16])\n LogicalSort(fetch=[5])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], email=[$9], lastname=[$10], _id=[$11], _index=[$12], _score=[$13], _maxscore=[$14], _sort=[$15], _routing=[$16], age=[WIDTH_BUCKET($8, 3, MIN($8) OVER (), MAX($8) OVER ())])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n","physical":"EnumerableLimit(fetch=[10000])\n EnumerableCalc(expr#0..12=[{inputs}], expr#13=[3], expr#14=[WIDTH_BUCKET($t8, $t13, $t11, $t12)], proj#0..7=[{exprs}], email=[$t9], lastname=[$t10], age=[$t14])\n EnumerableLimit(fetch=[5])\n EnumerableWindow(window#0=[window(aggs [MIN($8), MAX($8)])])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[account_number, firstname, address, balance, gender, city, employer, state, age, email, lastname]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"balance\",\"gender\",\"city\",\"employer\",\"state\",\"age\",\"email\",\"lastname\"],\"excludes\":[]}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n"}} No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like JSON file instead of YAML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

String timestampStr = (String) timestamp;
java.time.LocalDateTime localDateTime =
java.time.LocalDateTime.parse(
timestampStr, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this pattern is different from something used in time_bins_test_data.json file?

And I was not quite sure if we can always assume this format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test data uses ISO 8601 format (2025-07-28T10:00:00 with 'T') for ingestion into OpenSearch, but when Calcite retrieves timestamp values and converts them to strings, it uses the SQL literal format (yyyy-MM-dd HH:mm:ss with space).

private static String formatTimestamp(long timestampMillis) {
Instant instant = Instant.ofEpochMilli(timestampMillis);
ZonedDateTime zdt = instant.atZone(ZoneOffset.UTC);
return zdt.format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make DateTimeFormatter instance a constant? (it is used above as well)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion!
I updated the following:
For parsing (input) - uses DATE_TIMESTAMP_FORMATTER
For formatting (output) - uses SQL_LITERAL_DATE_TIME_FORMAT

ahkcs added 5 commits October 21, 2025 19:52
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
@ahkcs ahkcs requested a review from ykmr1224 October 22, 2025 03:50
@yuancu yuancu added the bug Something isn't working label Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants