-
Notifications
You must be signed in to change notification settings - Fork 832
feat(parquetconverter): add support for additional sort columns during Parquet file generation #7003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(parquetconverter): add support for additional sort columns during Parquet file generation #7003
Conversation
…g Parquet file generation Signed-off-by: Angith <[email protected]>
…Parquet file generation Signed-off-by: Angith <[email protected]>
Hi @yeya24 I’ve raised this PR to add support for additional sort columns during Parquet file generation. A few points I wanted to clarify:
Thanks in advance for your guidance 🙏 |
…er/add-sort-columns
…lumns Signed-off-by: Angith <[email protected]>
Hi @yeya24, I’ve made updates to address the CI failure. When you get a chance, could you approve the workflow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. Thanks for the contribution and I think this change looks great!
Just some comments about the configuration. If we already have the sorting columns configurable as limits, we don't need the same config in parquet converter anymore
pkg/parquetconverter/converter.go
Outdated
if len(cfg.AdditionalSortColumns) > 0 { | ||
sortColumns = append(sortColumns, cfg.AdditionalSortColumns...) | ||
} | ||
cfg.AdditionalSortColumns = sortColumns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can maybe just remove sorting column from base converter option if we have the limits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve removed the sorting column from the base converter option. In the limits, the metric name is added as the default value.
pkg/parquetconverter/converter.go
Outdated
converterOpts := append(c.baseConverterOptions, convert.WithName(b.ULID.String())) | ||
|
||
userConfiguredSortColumns := c.limits.ParquetConverterSortColumns(userID) | ||
if len(userConfiguredSortColumns) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the if and always use append(sortColumns, userConfiguredSortColumns...)
as sorting columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
docs/guides/parquet-mode.md
Outdated
1. **Row Group Size**: Adjust `max_rows_per_row_group` based on your query patterns | ||
2. **Cache Size**: Tune `parquet_queryable_shard_cache_size` based on available memory | ||
3. **Concurrency**: Adjust `meta_sync_concurrency` based on object storage performance | ||
4. **Sort Columns**: Configure `sort_columns` based on your most common query filters to improve query performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the full config name parquet_converter_sort_columns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👍
Hi @yeya24, let me work on the comments and revert to you shortly. |
…er/add-sort-columns
Signed-off-by: Angith <[email protected]>
Hi @yeya24, Thank you for the feedback. I’ve incorporated all the review comments. Could you please take another look? |
I am sorry. Can you help resolve the conflict? I think what I just merged conflicts to what you added in this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code change looks great! The only conflict here is that we start to mark parquet limits as unhidden from doc so you need to remove the doc: hidden
part
…er/add-sort-columns
@yeya24, I have resolved the conflict. Could you please verify? |
We need to fix lint. Can you run |
…umns configuration Signed-off-by: Angith <[email protected]>
@yeya24 I have run the |
@Angith It seems that the lint still failed due to white noise. Can you try running the make target locally?
Other test failures are unrelated. I will retry them |
What this PR does:
Which issue(s) this PR fixes:
Fixes #6941
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]