Skip to content

Conversation

michel-aractingi
Copy link
Collaborator

What this does

Branched from #1894, fixed tests and added lazy loading

Simon Alibert and others added 30 commits February 10, 2025 16:39
Co-authored-by: Simon Alibert <[email protected]>
Co-authored-by: Remi Cadene <[email protected]>
Co-authored-by: Remi <[email protected]>
…_v2.1' into user/rcadene/2025_02_19_port_openx
@michel-aractingi michel-aractingi force-pushed the user/michel-aractingi/2025-9-9-incremental_parquet_writing branch from 783a6b3 to b0bc4a3 Compare September 12, 2025 10:08
@michel-aractingi michel-aractingi changed the base branch from user/michel-aractingi/2025_06_30_dataset_v3 to main September 12, 2025 14:40
@michel-aractingi michel-aractingi changed the base branch from main to user/michel-aractingi/2025_06_30_dataset_v3 September 12, 2025 15:44
@michel-aractingi michel-aractingi changed the base branch from user/michel-aractingi/2025_06_30_dataset_v3 to main September 12, 2025 15:45
@michel-aractingi michel-aractingi force-pushed the user/michel-aractingi/2025-9-9-incremental_parquet_writing branch from 66801d5 to 38c5c20 Compare September 15, 2025 08:15
Comment on lines +114 to +115
writer = getattr(self, "writer", None)
if writer is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any opposition to walrus operator := ?

if (writer := getattr(self, "writer", None)):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all. I just think we don't use it anywhere else in the code and for readability its a bit better

@Copilot Copilot AI review requested due to automatic review settings September 24, 2025 13:36
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements incremental parquet writing functionality for LeRobot datasets, improving performance during data collection by avoiding expensive file reloads and memory operations between episodes.

Key changes include:

  • Implements PyArrow-based incremental parquet writing with persistent writers
  • Adds lazy loading mechanism to defer expensive dataset loading until actually needed for reading
  • Introduces finalize() method to properly close parquet writers and complete dataset creation

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/lerobot/datasets/lerobot_dataset.py Core implementation of incremental parquet writing, lazy loading, and writer management
src/lerobot/datasets/video_utils.py Adds finalize() call to properly close writers in video recording context
tests/datasets/test_datasets.py Updates test cases to call finalize() method after dataset creation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@michel-aractingi michel-aractingi force-pushed the user/michel-aractingi/2025-9-9-incremental_parquet_writing branch from 2c54ed4 to 5df5f0f Compare September 29, 2025 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants