Skip to content

Conversation

raunaqmorarka
Copy link
Member

Description

Made changes to prefer adapting ordering operator from the
more efficient version of DefaultOrderingOperators#comparisonOperator
that avoids unnecessary copies of Slice for comparison

BenchmarkPagesIndexOrdering.benchmarkQuickSort
(numberOfChannels)  (typeName)  Mode  Cnt  Before score    After score    Unit
                 1      BIGINT  avgt   20  29.536 ± 0.696  28.299 ± 0.344 ms/op
                 1     VARCHAR  avgt   20  66.897 ± 1.914  53.538 ± 0.635 ms/op
                10      BIGINT  avgt   20  36.965 ± 0.270  36.577 ± 0.605 ms/op
                10     VARCHAR  avgt   20  79.887 ± 3.077  60.413 ± 0.678 ms/op

E.g. from current code

io.airlift.slice.Slice.slice(Slice.java:1042)
io.trino.spi.block.VariableWidthBlock.getSlice(VariableWidthBlock.java:160)
io.trino.spi.type.VarcharType.getSlice(VarcharType.java:188)
io.trino.$gen.PagesIndexComparator_20250923_193215_67.compareTo(Unknown Source)
io.trino.operator.PagesIndexOrdering.quickSort(PagesIndexOrdering.java:88)
io.trino.operator.PagesIndexOrdering.quickSort(PagesIndexOrdering.java:134)
io.trino.operator.PagesIndexOrdering.sort(PagesIndexOrdering.java:37)
io.trino.operator.PagesIndex.sort(PagesIndex.java:424)
io.trino.operator.WindowOperator.sortPagesIndexIfNecessary(WindowOperator.java:905)

Now this will use io.trino.spi.type.AbstractVariableWidthType.DefaultOrderingOperators#comparisonOperator(io.trino.spi.block.VariableWidthBlock, int, io.trino.spi.block.VariableWidthBlock, int) instead which avoids Slice copies

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## General
* Improve performance of queries with ORDER BY. ({issue}`issuenumber`)

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the performance of ORDER BY queries by optimizing string comparisons in PagesIndexComparator. The key improvement is switching from using BLOCK_POSITION to BLOCK_POSITION_NOT_NULL arguments in ordering operators, which allows the system to use a more efficient comparison method that avoids unnecessary Slice copies for string types.

  • Updated ordering operator generation to use BLOCK_POSITION_NOT_NULL instead of BLOCK_POSITION for better performance
  • Added scoring logic for the new VALUE_BLOCK_POSITION_NOT_NULL argument convention
  • Enhanced benchmark tests to measure performance improvements for both BIGINT and VARCHAR types

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
TypeOperators.java Modified ordering operator generation to use more efficient comparison operators and added scoring for new argument convention
BenchmarkPagesIndexOrdering.java Enhanced benchmark to test both BIGINT and VARCHAR types with updated timing parameters

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Made changes to prefer adapting ordering operator from the
more efficient version of DefaultOrderingOperators#comparisonOperator
that avoids unnecessary copies of Slice for comparison

BenchmarkPagesIndexOrdering.benchmarkQuickSort
(numberOfChannels)  (typeName)  Mode  Cnt  Before score    After score    Unit
                 1      BIGINT  avgt  20   29.536 ± 0.696  28.299 ± 0.344 ms/op
                 1     VARCHAR  avgt  20   66.897 ± 1.914  53.538 ± 0.635 ms/op
                10      BIGINT  avgt  20   36.965 ± 0.270  36.577 ± 0.605 ms/op
                10     VARCHAR  avgt  20   79.887 ± 3.077  60.413 ± 0.678 ms/op
@starburstdata-automation
Copy link

starburstdata-automation commented Oct 2, 2025

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: failure
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: Found regressions for:<br/>(presto/tpcds, q04, totalCpuTime, over by 14.4%)<br/>(presto/tpcds, q78, totalCpuTime, over by 11.2%)
Benchmark Comparison to the closest run from Master: Report

@starburstdata-automation
Copy link

starburstdata-automation commented Oct 2, 2025

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_unpart.

Building Trino finished with status: success
Benchmark finished with status: success
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: NO Regression found.
Benchmark Comparison to the closest run from Master: Report

@starburstdata-automation
Copy link

starburstdata-automation commented Oct 2, 2025

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Comparing results to the static baseline values, follow above workflow link for more details/logs.
Status message: NO Regression found.
Benchmark Comparison to the closest run from Master: Report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants