Skip to content

Conversation

lilianm
Copy link
Contributor

@lilianm lilianm commented Sep 1, 2025

Which issue does this PR close?

Rationale for this change

Use ArrowRowGroupWriter helper class for write row group when you use API get_column_writers / append_row_group in ArrowWriter implemented in issue

What changes are included in this PR?

Set public ArrowRowGroupWriter and move memory_size, get_estimated_total_bytes and rows_count from ArrowWriter

Are these changes tested?

Yes

Are there any user-facing changes?

Yes add function in ArrowRowGroupWriter and expose it

@github-actions github-actions bot added the parquet Changes to the parquet crate label Sep 1, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @lilianm -- I think this idea seems reasonable to me

If we want to make this a public API, I think we should add some more documentation -- specifically, can we please add a doc test that shows how a user will use the ArrowRowGroupWriter?

Specifically, I am thinking about something like this https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowColumnWriter.html#example-encoding-two-arrow-arrays-in-parallel

@alamb alamb marked this pull request as draft September 6, 2025 10:19
@alamb
Copy link
Contributor

alamb commented Sep 6, 2025

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Parquet] Expose ArrowRowGroupWriter
2 participants