-
Couldn't load subscription status.
- Fork 176
Fix bins on time-related fields
#4612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
| RexNode maxValue = context.relBuilder.max(fieldExpr).over().toRex(); | ||
| RexNode dataRange = context.relBuilder.call(SqlStdOperatorTable.MINUS, maxValue, minValue); | ||
| RexNode minValue = context.relBuilder.min(fieldExpr).over().toRex(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitnit: more intuitive to start with minValue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switched order
| } | ||
|
|
||
| /** Width bucket calculation using nice number algorithm. */ | ||
| public static String calculateWidthBucket( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is cleaner to have separate method for timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated
| @@ -0,0 +1 @@ | |||
| {"calcite":{"logical":"LogicalSystemLimit(fetch=[10000], type=[QUERY_SIZE_LIMIT])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], email=[$8], lastname=[$9], age=[$16])\n LogicalSort(fetch=[5])\n LogicalProject(account_number=[$0], firstname=[$1], address=[$2], balance=[$3], gender=[$4], city=[$5], employer=[$6], state=[$7], email=[$9], lastname=[$10], _id=[$11], _index=[$12], _score=[$13], _maxscore=[$14], _sort=[$15], _routing=[$16], age=[WIDTH_BUCKET($8, 3, MIN($8) OVER (), MAX($8) OVER ())])\n CalciteLogicalIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]])\n","physical":"EnumerableLimit(fetch=[10000])\n EnumerableCalc(expr#0..12=[{inputs}], expr#13=[3], expr#14=[WIDTH_BUCKET($t8, $t13, $t11, $t12)], proj#0..7=[{exprs}], email=[$t9], lastname=[$t10], age=[$t14])\n EnumerableLimit(fetch=[5])\n EnumerableWindow(window#0=[window(aggs [MIN($8), MAX($8)])])\n CalciteEnumerableIndexScan(table=[[OpenSearch, opensearch-sql_test_index_account]], PushDownContext=[[PROJECT->[account_number, firstname, address, balance, gender, city, employer, state, age, email, lastname]], OpenSearchRequestBuilder(sourceBuilder={\"from\":0,\"timeout\":\"1m\",\"_source\":{\"includes\":[\"account_number\",\"firstname\",\"address\",\"balance\",\"gender\",\"city\",\"employer\",\"state\",\"age\",\"email\",\"lastname\"],\"excludes\":[]}}, requestedTotalSize=2147483647, pageSize=null, startFrom=0)])\n"}} No newline at end of file | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like JSON file instead of YAML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
| String timestampStr = (String) timestamp; | ||
| java.time.LocalDateTime localDateTime = | ||
| java.time.LocalDateTime.parse( | ||
| timestampStr, DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this pattern is different from something used in time_bins_test_data.json file?
And I was not quite sure if we can always assume this format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test data uses ISO 8601 format (2025-07-28T10:00:00 with 'T') for ingestion into OpenSearch, but when Calcite retrieves timestamp values and converts them to strings, it uses the SQL literal format (yyyy-MM-dd HH:mm:ss with space).
| private static String formatTimestamp(long timestampMillis) { | ||
| Instant instant = Instant.ofEpochMilli(timestampMillis); | ||
| ZonedDateTime zdt = instant.atZone(ZoneOffset.UTC); | ||
| return zdt.format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make DateTimeFormatter instance a constant? (it is used above as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion!
I updated the following:
For parsing (input) - uses DATE_TIMESTAMP_FORMATTER
For formatting (output) - uses SQL_LITERAL_DATE_TIME_FORMAT
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Description
Summary
This PR extends the
bincommand to support time-related fields (timestamp,date,time) in non-aggregation scenarios using theWIDTH_BUCKETUDF.Previously,
binson time fields only worked with aggregations viaauto_date_histogrampushdown.Now, users can apply binning on timestamp fields in projection queries without aggregation.
Related Issues
Resolves ##4578