Skip to content

Conversation

@HammadB
Copy link
Collaborator

@HammadB HammadB commented Sep 2, 2025

Description of changes

Summarize the changes made by this PR.

  • Improvements & Bug fixes
    • This PR performs a minor refactor to the block decoding to use serde_bytes which better handles serde [u8] reading by allocating upfront, changing the internal helpers for block decoding to operator on &[u8] instead of io streams. The observation being that the io streams were to unify the memory/file cases but since we will read the file into a buffer internal to the arrow file reader, we instead can just read that file and standardize internal interface to &[u8]
    • One copy is still made since foyer does not pass us ownership of the buffer.
    • There is some refactoring of arrow logic around footer parsing/field parsing for reuse
  • New functionality
    • /

Test plan

How are these changes tested?
Existing test coverage suffices as these are non functional changes at the boundary

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

None required

Observability plan

We should audit staging enviornment use cases closely

Documentation Changes

None

@github-actions
Copy link

github-actions bot commented Sep 2, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@propel-code-bot
Copy link
Contributor

propel-code-bot bot commented Sep 2, 2025

Optimize Block Decoding Using serde_bytes and Improved Buffer Handling

This PR refactors the block decoding logic in the blockstore module, transitioning internal decoding helpers to operate directly on byte slices (&[u8]) rather than I/O streams. It introduces the use of the serde_bytes crate for more efficient and clearer serialization/deserialization of binary data. Logic for Arrow IPC file parsing, particularly around footer and record batch extraction, has been consolidated into reusable helper functions. The changes also affect how block files are read from disk: instead of using streams, files are read fully into buffers before processing. This approach standardizes all in-memory and file-based logic and enables clearer code paths. Associated dependencies (serde_bytes) are introduced in Cargo.toml and Cargo.lock.

Key Changes

• Refactored block decoding helpers to operate on &[u8] instead of std::io streams.
• Integrated serde_bytes for [u8] serialization/deserialization, replacing manual Vec handling.
• Simplified logic for reading Arrow IPC files, extracting footer and record batch parsing into reusable functions.
• Reworked file-based block loading to fully read file contents into memory before decoding.
• Modified Cargo.toml and Cargo.lock to add serde_bytes dependency.
• Removed redundant wrapper functions, such as from_bytes_internal, in favor of more direct logic.

Affected Areas

• rust/blockstore/src/arrow/block/types.rs
• Cargo.toml
• Cargo.lock
• rust/blockstore/Cargo.toml

This summary was automatically generated by @propel-code-bot

@HammadB HammadB requested a review from sanketkedia September 4, 2025 21:09
@HammadB HammadB enabled auto-merge (squash) September 8, 2025 20:04
@HammadB HammadB merged commit f3def7b into main Sep 9, 2025
113 of 115 checks passed
jairad26 added a commit that referenced this pull request Sep 17, 2025
sanketkedia added a commit that referenced this pull request Sep 18, 2025
## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
  - Reverts #5396
- New functionality
  - ...

## Test plan

_How are these changes tested?_
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Migration plan
None

## Observability plan
lodc errors should go away in prod

## Documentation Changes
None
chroma-droid pushed a commit that referenced this pull request Sep 18, 2025
## Description of changes

_Summarize the changes made by this PR._

- Improvements & Bug fixes
  - Reverts #5396
- New functionality
  - ...

## Test plan

_How are these changes tested?_
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Migration plan
None

## Observability plan
lodc errors should go away in prod

## Documentation Changes
None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants