Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 1, 2025

Which issue does this PR close?

Rationale for this change

@etseidl comments: #8756 (comment)

Not relevant to this PR, but I think this TODO has largely been addressed by #8376 which enabled skipping the decoding of the page statistics.

While I was in here, I also wanted to capture the learning based on @mapleFU 's comment #8756 (comment)

The code looks good to me but the I don't know if the comment "not compressed" can be replaced, if decompress_buffer is called and decompressed_size == 0 , seems that it generally means something like "this page only have levels, but not have non-null values"? ( Point me out if I'm wrong)

What changes are included in this PR?

Include some comments

Are these changes tested?

No (there are no code changes)

Are there any user-facing changes?

No, this is internal comments only. No code / behavior changes

@alamb alamb marked this pull request as ready for review November 1, 2025 10:52
@github-actions github-actions bot added the parquet Changes to the parquet crate label Nov 1, 2025
@alamb alamb changed the title Update comments in page decompressor Minor: Update comments in page decompressor Nov 1, 2025
@alamb alamb changed the title Minor: Update comments in page decompressor [Paruqet] Minor: Update comments in page decompressor Nov 1, 2025
Comment on lines +399 to +400
// decompressed size of zero corresponds to a page with only null values
// see https://github.com/apache/parquet-format/blob/master/README.md#data-pages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this is ok to me, but I don't know would some page being just be empty (with num-rows == 0), I know currently most writer would not write that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I did some code coverage testing, I found it was rare but not impossible (it happens 3 times in the parquet-rs suite: #8756 (comment))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would page without values or some word better here?

Copy link
Contributor

@etseidl etseidl Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would page without values or some word better here?

I think the current wording is correct. In this case num_values is non-zero, so there are values, but they all just happen to be null (and thus are not encoded in the data).

I'd think the ways decompressed_size can be zero are 1) v2 page with all null values, 2) v1 page with no nesting and all nulls at the top level (so D is always 0). Second case would have definition level data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current wording is correct. In this case num_values is non-zero, so there are values, but they all just happen to be null (and thus are not encoded in the data).

I mean, I don't know would a case that just num_rows ( not num_values ) == 0 would exists

@alamb alamb added the documentation Improvements or additions to documentation label Nov 3, 2025
@alamb alamb changed the title [Paruqet] Minor: Update comments in page decompressor [Parquet] Minor: Update comments in page decompressor Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants