Skip to content

Conversation

@kadirozde
Copy link
Contributor

No description provided.

Copy link
Contributor

@ujjawal4046 ujjawal4046 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkhurana has already asked on how to account for all versions/delete markers in a row in a comment on the jira.

@kadirozde A related question to this -

  • If we introduce a separate RAW_ROW_SIZE for such case, can it be used in conjunction with other functions (e.g. select raw_row_size,count(*) from table group by tenant_id;). This is considering raw_row_size would need to do a raw scan which may not be compatible with other queries (which assumes that scan only always return most recent version)

@ujjawal4046, I added the support for RAW_ROW_SIZE() in this PR

* Function to return the total size of the HBase cells that constitute a given row
*/
@BuiltInFunction(name = RowSizeFunction.NAME, nodeClass = RowSizeParseNode.class, args = {})
public class RowSizeFunction extends ScalarFunction {
Copy link
Contributor

@ujjawal4046 ujjawal4046 Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the scalar function evaluated on server side as well (or only on client side) ? If it's client side, then we need to fetch the whole row back to client for size computation ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be evaluated on the client size to check if the where clause evaluates to true on an empty tuple and, or when it is specified as a top level expression node in a select clause. This PR does not allow the row_size function to be a top level node in a select clause. In this PR, a row is never returned to the client; only its size is returned as part of an aggregation function result.

boolean asSubquery, boolean allowPageFilter, QueryPlan innerPlan, boolean inJoin,
boolean inUnion) throws SQLException {
for (AliasedNode node : select.getSelect()) {
if (node.getNode() instanceof RowSizeParseNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear with the usage. Does it mean we can't use query where row_size needs to be fetched for each row (e.g. select row_size() from table or select row_size() from table group by tenant_id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the exception message and also the row size test to see how to get individual row sizes.

}
if (context.hasRowSizeFunction()) {
scan.getFamilyMap().clear();
ScanUtil.removePageFilter(scan);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the need to remove family map so that it accounts for cells across all column families.

Why do we need to remove page filter ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Page filter removal was not intentional. Good catch!

@tkhurana
Copy link
Contributor

tkhurana commented Oct 3, 2025

@tkhurana
Copy link
Contributor

tkhurana commented Oct 3, 2025

java.net.SocketTimeoutException: callTimeout=60000, callDuration=68935: java.io.IOException: org.apache.phoenix.expression.OrExpression cannot be cast to org.apache.phoenix.expression.function.SingleAggregateFunction at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:479) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102) at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82) Caused by: java.lang.ClassCastException: org.apache.phoenix.expression.OrExpression cannot be cast to org.apache.phoenix.expression.function.SingleAggregateFunction at org.apache.phoenix.expression.aggregator.ServerAggregators.deserialize(ServerAggregators.java:118) at org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.next(UngroupedAggregateRegionScanner.java:613) at org.apache.phoenix.coprocessor.UngroupedAggregateRegionScanner.nextRaw(UngroupedAggregateRegionScanner.java:595) at org.apache.phoenix.coprocessor.DelegateRegionScanner.next(DelegateRegionScanner.java:108) at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77) at org.apache.phoenix.coprocessor.DelegateRegionScanner.next(DelegateRegionScanner.java:108) at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77) at org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:266) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3403) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3669) at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:43508) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415) ... 3 more

@kadirozde
Copy link
Contributor Author

@tkhurana, Thank you for pointing out the test failure. It should be fixed now.

@kadirozde kadirozde merged commit 59474cd into apache:master Oct 6, 2025
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants