Skip to content

Conversation

Angith
Copy link

@Angith Angith commented Sep 1, 2025

What this PR does:

Which issue(s) this PR fixes:
Fixes #6941

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@Angith Angith changed the title Parquetconverter/add sort columns feat(parquetconverter): add support for additional sort columns during Parquet file generation Sep 1, 2025
@Angith
Copy link
Author

Angith commented Sep 1, 2025

Hi @yeya24

I’ve raised this PR to add support for additional sort columns during Parquet file generation. A few points I wanted to clarify:

  • As per the contributing guide, I ran make doc, but it ended up deleting some other important documents. I also ran goimports, but it reformatted ~700 files, including files in the vendor folder. I assume these shouldn’t be committed, so I left those changes out. Could you please confirm the expected workflow here?
  • I’ve written unit tests for the new functionality and verified, but I haven’t yet tested the change end-to-end. Could you guide me on how to run the full end-to-end tests for Cortex to validate this change?

Thanks in advance for your guidance 🙏

@Angith
Copy link
Author

Angith commented Sep 7, 2025

Hi @yeya24, I’ve made updates to address the CI failure. When you get a chance, could you approve the workflow?

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. Thanks for the contribution and I think this change looks great!

Just some comments about the configuration. If we already have the sorting columns configurable as limits, we don't need the same config in parquet converter anymore

if len(cfg.AdditionalSortColumns) > 0 {
sortColumns = append(sortColumns, cfg.AdditionalSortColumns...)
}
cfg.AdditionalSortColumns = sortColumns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can maybe just remove sorting column from base converter option if we have the limits

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve removed the sorting column from the base converter option. In the limits, the metric name is added as the default value.

converterOpts := append(c.baseConverterOptions, convert.WithName(b.ULID.String()))

userConfiguredSortColumns := c.limits.ParquetConverterSortColumns(userID)
if len(userConfiguredSortColumns) > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the if and always use append(sortColumns, userConfiguredSortColumns...) as sorting columns.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

1. **Row Group Size**: Adjust `max_rows_per_row_group` based on your query patterns
2. **Cache Size**: Tune `parquet_queryable_shard_cache_size` based on available memory
3. **Concurrency**: Adjust `meta_sync_concurrency` based on object storage performance
4. **Sort Columns**: Configure `sort_columns` based on your most common query filters to improve query performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the full config name parquet_converter_sort_columns

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

@Angith
Copy link
Author

Angith commented Sep 15, 2025

Hi @yeya24, let me work on the comments and revert to you shortly.

@pull-request-size pull-request-size bot added size/M and removed size/L labels Sep 21, 2025
@Angith
Copy link
Author

Angith commented Sep 21, 2025

Hi @yeya24, Thank you for the feedback. I’ve incorporated all the review comments. Could you please take another look?

@Angith Angith requested a review from yeya24 September 21, 2025 13:33
@yeya24
Copy link
Contributor

yeya24 commented Sep 21, 2025

I am sorry. Can you help resolve the conflict? I think what I just merged conflicts to what you added in this PR

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change looks great! The only conflict here is that we start to mark parquet limits as unhidden from doc so you need to remove the doc: hidden part

@Angith
Copy link
Author

Angith commented Sep 22, 2025

@yeya24, I have resolved the conflict. Could you please verify?

@yeya24
Copy link
Contributor

yeya24 commented Sep 22, 2025

We need to fix lint. Can you run make doc or make docs?

@Angith
Copy link
Author

Angith commented Sep 23, 2025

@yeya24 I have run the make doc command and pushed the changes. Could you please check now?

@yeya24
Copy link
Contributor

yeya24 commented Sep 24, 2025

@Angith It seems that the lint still failed due to white noise. Can you try running the make target locally?

Run make BUILD_IN_CONTAINER=false check-white-noise
  make BUILD_IN_CONTAINER=false check-white-noise
  shell: sh -e {0}
Please remove trailing whitespaces running 'make clean-white-noise'

Other test failures are unrelated. I will retry them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow configuring sorting labels for Parquet Converter
2 participants