Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
184 commits
Select commit Hold shift + click to select a range
38c1457
Bump CODEBASE_VERSION
Feb 10, 2025
57c9c21
Merge remote-tracking branch 'origin/main' into user/aliberts/2025_02…
Feb 10, 2025
d67ca34
Merge remote-tracking branch 'origin/main' into user/aliberts/2025_02…
Feb 11, 2025
9d6886d
Add frame level task (#693)
Cadene Feb 14, 2025
7c2bbee
Validate features during `add_frame` + Add 2D-to-5D + Add string (#720)
Cadene Feb 14, 2025
8426c64
Per-episode stats (#521)
aliberts Feb 15, 2025
aed3eb4
Merge remote-tracking branch 'origin/main' into user/aliberts/2025_02…
Feb 15, 2025
624eaf1
Merge remote-tracking branch 'origin/main' into user/aliberts/2025_02…
Feb 17, 2025
02bc4e0
support openx/rlds to lerobot
Tavish9 Feb 18, 2025
fbf2f22
Remove `local_files_only` and use `codebase_version` instead of branc…
aliberts Feb 19, 2025
76436ca
Merge remote-tracking branch 'tavish9_lerobot_openx/main' into user/r…
Cadene Feb 19, 2025
2487228
Use `HF_HOME` env variable (#753)
aliberts Feb 19, 2025
6fe42a7
Add tag
Feb 19, 2025
969ef74
Remove dataset `consolidate` (#752)
aliberts Feb 19, 2025
392a8c3
Improve doc
Feb 20, 2025
64ed525
Fix batch convert
Feb 20, 2025
b520941
Merge remote-tracking branch 'origin/user/aliberts/2025_02_10_dataset…
Cadene Feb 20, 2025
71d1f5e
WIP
Cadene Feb 20, 2025
5fbbaa1
fix No such file or directory error
Cadene Feb 20, 2025
93c80b2
rm brake
Cadene Feb 20, 2025
52fb414
workers
Cadene Feb 21, 2025
15e7a9d
before new launch from scratch
Cadene Feb 21, 2025
eda0b99
new dir
Cadene Feb 21, 2025
689c5ef
optimize shard
Cadene Feb 22, 2025
39ad2d1
let's go
Cadene Feb 22, 2025
ff0029f
aggregate works
Cadene Feb 22, 2025
e2e6f6e
Add auto_downsample_height_width
Cadene Feb 23, 2025
c36d225
Aggregate works
Cadene Feb 23, 2025
3daab2a
Add upload_large_folder
Cadene Feb 23, 2025
3666ac9
WIP UploadDataset
Cadene Mar 1, 2025
7866c1f
Merge remote-tracking branch 'origin/main' into user/rcadene/2025_02_…
Cadene Mar 1, 2025
1a5c1ef
Rename openx to droid + Improve all (not tested)
Cadene Mar 18, 2025
5d184a7
NIT
Cadene Mar 18, 2025
65738f0
Improve slurm droid
Cadene Mar 20, 2025
53ecec5
WIP v21 to v30
Cadene Mar 31, 2025
c1b28f0
Commit before episodes episodes_stats merging
Cadene Apr 9, 2025
34c5d4c
Most unit tests are passing
Cadene Apr 11, 2025
6c4d122
fix joints
Cadene Apr 11, 2025
c2a05a1
Fix (Now loading all frames is possible)
Cadene Apr 14, 2025
6b6a990
most unit tests passing (TODO: convert datasets)
Cadene Apr 16, 2025
eab5543
Merge (No verify)
Cadene Apr 17, 2025
54b5c80
Revert mistake convert_dataset_v20_to_v21.py
Cadene Apr 17, 2025
b0cca75
Progress on aggregate_datasets
Cadene Apr 19, 2025
9c0836c
Remove legacy from datasets/utils.py
Cadene Apr 19, 2025
5a6ea09
Rename tests/test_aggregate_datasets.py -> tests/datasets/test_aggreg…
Cadene Apr 19, 2025
4acf99f
pre-commit run --all-files
Cadene Apr 21, 2025
4375a05
Add push to hub for convert_dataset_v21_to_v30
Cadene Apr 21, 2025
2866d07
small fix ffmpeg encoding
Cadene Apr 21, 2025
b9b880b
fix get_parquet_file_size_in_mb + DEFAULT_FILE_SIZE_IN_MB=100
Cadene Apr 21, 2025
20b74ae
fix
Cadene Apr 21, 2025
0a390de
Merge remote-tracking branch 'origin/main' into user/rcadene/2025_04_…
Cadene Apr 21, 2025
eaec52a
Merge remote-tracking branch 'origin/user/rcadene/2025_04_11_dataset_…
Cadene Apr 22, 2025
d4af224
Fix unit tests
Cadene Apr 22, 2025
8c43b3d
Faster self.meta.episodes[...]
Cadene Apr 22, 2025
01bc89b
Merge remote-tracking branch 'origin/user/rcadene/2025_04_11_dataset_…
Cadene Apr 23, 2025
ad1ad11
fix hf_dataset.set_transform(hf_transform_to_torch)
Cadene Apr 23, 2025
fde67db
Fix convert v30 with image datasets
Cadene Apr 24, 2025
6f0fc7f
Aggregate: Add concatenation
Cadene May 2, 2025
a231930
Fix aggregate (num_frames, dataset_from_index, index)
Cadene May 6, 2025
ee25664
Uploaded droid 1.0.1
Cadene May 8, 2025
220997f
Fix visualize_dataset with rerun
Cadene May 8, 2025
58795d7
In tests: Add use_videos=False by default, Create mp4 file if True, t…
Cadene May 12, 2025
13a1f68
WIP aggregate
Cadene May 16, 2025
ba022dd
Merge remote-tracking branch 'origin/user/rcadene/2025_04_11_dataset_…
Cadene May 16, 2025
8c1503d
WIP after Francesco discussion
Cadene May 28, 2025
d4fbf6e
add: support for videos generation in datasets
fracapuano Jun 6, 2025
378c147
fix: debug aggregation code
fracapuano Jun 6, 2025
848a494
add: tests for aggregation code
fracapuano Jun 6, 2025
01d0b7b
fix: modularize tests to improve readability
fracapuano Jun 10, 2025
c3e98db
add missing files for porting agibot
michel-aractingi Jun 30, 2025
d9b9cc8
fix(rebase) reverting files to main
michel-aractingi Jun 30, 2025
6b482a9
fix(rebase) deleting media related to tutorials
michel-aractingi Jun 30, 2025
0a1da47
fix(precommit) solve precommit issues
michel-aractingi Jun 30, 2025
5e39b4c
fix(tests)
michel-aractingi Jul 1, 2025
6de5670
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 1, 2025
0f66bbe
Migrate PR to new folder structure introduce on 1417
michel-aractingi Jul 2, 2025
9dde882
style nit
michel-aractingi Jul 2, 2025
1c17419
Reverted back files that were changed during the rebase
michel-aractingi Jul 2, 2025
012d428
Reverted back missing files in `src/lerobot/configs/`
michel-aractingi Jul 2, 2025
66454a0
Remove more references to lerobot.common
michel-aractingi Jul 2, 2025
69b1f7b
nit precommit
michel-aractingi Jul 2, 2025
830a3b9
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 2, 2025
3dbc3e6
Added docstrings to aggregate, fix test_policies.py
michel-aractingi Jul 4, 2025
83bf24c
fix(tests) add features argument to `load_nested_dataset`
michel-aractingi Jul 5, 2025
bee74c3
Fix(tests) fix task index error in test_policies
michel-aractingi Jul 6, 2025
30ffa25
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 6, 2025
9287c36
- Added missing license in the new scripts
michel-aractingi Jul 6, 2025
4a466d9
moved legacy functions to convert_stats.py
michel-aractingi Jul 6, 2025
18209e6
Added the use of `aggregate.py` in `slurm_aggregate_shards.py`
michel-aractingi Jul 7, 2025
c8a5df9
partial fix html visualization tool: Added `start_time` and `end_time…
michel-aractingi Jul 7, 2025
4e01f87
add: tests forcing new file creation
fracapuano Jun 11, 2025
a49760e
fix: tests depending on various sizes, and duration is updated
fracapuano Jun 11, 2025
a4d3a41
Added Francescos PRs for fixing aggregate.py
michel-aractingi Jul 8, 2025
6a9834e
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 8, 2025
2a76135
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 8, 2025
3483e44
Removed examples from import path in `port_datasets`
michel-aractingi Jul 15, 2025
e05d22c
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 15, 2025
788dde3
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 17, 2025
4c0ac93
nit
michel-aractingi Jul 17, 2025
5ec70f7
removed check_timestamps_sync that is no longer used in the code,
michel-aractingi Jul 18, 2025
ec40fc4
Removed references to batch encoding to be added later or in another PR
michel-aractingi Jul 18, 2025
8ffc00d
Removed batch_encoding_Size from record.py
michel-aractingi Jul 18, 2025
23375cc
fix(tests) bug in clear_episode_buffer
michel-aractingi Jul 19, 2025
f98f01e
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 19, 2025
ac0fd71
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 21, 2025
dcb02a9
fix(convert_v1) use correct legacy path, remove comments from scripts…
michel-aractingi Jul 21, 2025
066b81a
moved concat_video function to video_utils, cleaned some code
michel-aractingi Jul 21, 2025
c993fea
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 21, 2025
670d7f4
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 21, 2025
218ebed
feat(convert_dataset_v21_to_v3) added the use of more efficient Datas…
michel-aractingi Jul 22, 2025
59d108a
fix(convert_v2_v3) reverted concat data files from previous commit
michel-aractingi Jul 29, 2025
788544d
update lerobot_dataset docstring
michel-aractingi Jul 29, 2025
6447352
added a check for comparing cached episodes in order to trigger a new…
michel-aractingi Jul 29, 2025
890b1e4
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Jul 29, 2025
527ae8e
Add variable-size test datasets (#1610)
fracapuano Jul 30, 2025
1c79e3d
Added mock context manager to tests in order to avoid calls to the hu…
michel-aractingi Jul 30, 2025
f94092c
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 30, 2025
4048b02
improved typing in `datasets/utils.py`
michel-aractingi Jul 31, 2025
267a753
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Aug 12, 2025
c7a3b02
fixed tensor indicies in `_check_cached_episode_sufficient` in lerobo…
michel-aractingi Aug 13, 2025
db36f01
add update_chunk_settings method for LeRobotDatasetMetadata. Introduc…
michel-aractingi Aug 17, 2025
2ca6edc
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
fracapuano Aug 25, 2025
64a9dd3
Removed agibot files and moved port_droid to port_datasets
michel-aractingi Aug 28, 2025
2b03dec
Removed .item from save_dataset_to_safetensors
michel-aractingi Aug 28, 2025
213ffe0
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 28, 2025
35f36e8
removed outdated todos
michel-aractingi Aug 28, 2025
000e887
removed unused functions from tests/fixtures
michel-aractingi Aug 28, 2025
bbd64b9
fixes in `datasets/utils.py`
michel-aractingi Aug 28, 2025
47aee1f
revert back `video_utils.py` to using pyav while keeping concat_video…
michel-aractingi Aug 28, 2025
84ffc28
moved `get_video_duration_in_s` to video_utils and replaced subproces…
michel-aractingi Aug 28, 2025
adad369
chore(dataset v1): drop support for dataset v1 format
CarolinePascal Sep 1, 2025
0a30636
chore(dataset v2.0): drop support for dataset v2.0 format
CarolinePascal Sep 1, 2025
4062d05
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 1, 2025
2df4e25
added the file and video max size as arguments
michel-aractingi Sep 2, 2025
2a3d622
visualize_dataset_html deprecated
michel-aractingi Sep 2, 2025
fdccf77
fix(memory explosion) added delete to episodes and hf_dataset everyti…
michel-aractingi Sep 3, 2025
0e04f5f
remove html templates and flask dependency
michel-aractingi Sep 3, 2025
7868df2
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Sep 3, 2025
1db3401
remove unused Iterable Namespace
michel-aractingi Sep 3, 2025
992fb17
further memory optimizations needed due to calling `pd.concat`
michel-aractingi Sep 3, 2025
0747afd
Optimize dataset updates by incrementally concatenating new data inst…
michel-aractingi Sep 5, 2025
952f455
fix(bug) in save_episode_data
michel-aractingi Sep 5, 2025
af79dda
fix(caching) remove cache dir when collecting a dataset with each cal…
michel-aractingi Sep 8, 2025
62a3361
fix(video utils): adding up-to-date support for batch encoding and vi…
CarolinePascal Sep 8, 2025
62bfbf3
fix tests
michel-aractingi Sep 8, 2025
9e0131c
fix in replay to filter for episode index in chunked data files
michel-aractingi Sep 12, 2025
6da0dd3
Added rigorous testing to validate the consistency of the meta data a…
michel-aractingi Sep 12, 2025
5ad1bb4
incremental parquet writing
hsandhawalia Sep 8, 2025
69435c2
add .finalise() and a backup __del__ for stopping writers
hsandhawalia Sep 9, 2025
fb299ef
fix missing import
hsandhawalia Sep 9, 2025
6db7ee2
precommit fixes added back the use of embed images
michel-aractingi Sep 9, 2025
435f48b
added lazy loading for hf_Dataset to avoid frequently reloading the d…
michel-aractingi Sep 9, 2025
420f9d9
fix bug in video timestamps
michel-aractingi Sep 10, 2025
aa5a064
Added proper closing of parquet file before reading
michel-aractingi Sep 10, 2025
b0bc4a3
Added rigorous testing to validate the consistency of the meta data a…
michel-aractingi Sep 10, 2025
3b4b082
fix bug in episode index during clear_episode_buffer
michel-aractingi Sep 12, 2025
3a7dc18
fix(empty concat): check for empty paths list before data files conca…
CarolinePascal Sep 12, 2025
552d0eb
Merge branch 'user/michel-aractingi/2025_06_30_dataset_v3' into user/…
michel-aractingi Sep 12, 2025
bcf0aab
Merge branch 'main' into user/michel-aractingi/2025_06_30_dataset_v3
michel-aractingi Sep 12, 2025
68f89ab
fix(v3.0 message): updating v3.0 backward compatibility message.
CarolinePascal Sep 12, 2025
244fd79
Merge branch 'user/michel-aractingi/2025_06_30_dataset_v3' into user/…
michel-aractingi Sep 12, 2025
38c5c20
Merge branch 'user/michel-aractingi/2025_06_30_dataset_v3' into user/…
michel-aractingi Sep 12, 2025
5a89df3
Merge branch 'main' into user/michel-aractingi/2025-9-9-incremental_p…
michel-aractingi Sep 24, 2025
58e8f6f
Merge branch 'main' into user/michel-aractingi/2025-9-9-incremental_p…
michel-aractingi Sep 24, 2025
b6ea78b
added fixes for the resume logic
michel-aractingi Sep 29, 2025
5df5f0f
answering co-pilot review
michel-aractingi Sep 29, 2025
0fe356c
reverting some changes and style nits
michel-aractingi Sep 29, 2025
5e05fc7
removed unused functions
michel-aractingi Sep 29, 2025
350dc6f
Merge branch 'main' into user/michel-aractingi/2025-9-9-incremental_p…
michel-aractingi Oct 1, 2025
291a825
fix chunk_id and file_id when resuming
michel-aractingi Sep 30, 2025
39ca3ad
- fix parquet loading when resuming
michel-aractingi Oct 1, 2025
8d4130a
added general function get_file_size_in_mb and removed the one for video
michel-aractingi Oct 1, 2025
decb864
fix table size value when resuming
michel-aractingi Oct 2, 2025
8ca4214
Remove unnecessary reloading of the parquet file when resuming record.
michel-aractingi Oct 3, 2025
1fdd00d
added back reading parquet file for image datasets only
michel-aractingi Oct 3, 2025
55e84e7
Merge branch 'main' into user/michel-aractingi/2025-9-9-incremental_p…
michel-aractingi Oct 3, 2025
0686adf
Merge branch 'main' into user/michel-aractingi/2025-9-9-incremental_p…
michel-aractingi Oct 7, 2025
98ef3ec
- respond to Qlhoest comments
michel-aractingi Oct 9, 2025
6d6eeed
Merge branch 'main' into user/michel-aractingi/2025-9-9-incremental_p…
michel-aractingi Oct 10, 2025
7e560b2
fix(dataset_tools) with the new logic using proper finalize
michel-aractingi Oct 10, 2025
6d7984b
nit in flush_metadata_buffer
michel-aractingi Oct 10, 2025
11c958b
fix(lerobot_dataset) return the right dataset len when a subset of th…
michel-aractingi Oct 10, 2025
47b5c79
Merge branch 'main' into user/michel-aractingi/2025-9-9-incremental_p…
michel-aractingi Oct 10, 2025
dceca64
Merge branch 'main' into user/michel-aractingi/2025-9-9-incremental_p…
michel-aractingi Oct 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions src/lerobot/datasets/aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
DEFAULT_EPISODES_PATH,
DEFAULT_VIDEO_FILE_SIZE_IN_MB,
DEFAULT_VIDEO_PATH,
get_file_size_in_mb,
get_parquet_file_size_in_mb,
get_video_size_in_mb,
to_parquet_with_hf_images,
update_chunk_file_indices,
write_info,
Expand Down Expand Up @@ -217,6 +217,7 @@ def aggregate_datasets(
robot_type=robot_type,
features=features,
root=aggr_root,
use_videos=len(video_keys) > 0,
chunks_size=chunk_size,
data_files_size_in_mb=data_files_size_in_mb,
video_files_size_in_mb=video_files_size_in_mb,
Expand Down Expand Up @@ -307,8 +308,9 @@ def aggregate_videos(src_meta, dst_meta, videos_idx, video_files_size_in_mb, chu
current_offset += src_duration
continue

src_size = get_video_size_in_mb(src_path)
dst_size = get_video_size_in_mb(dst_path)
# Check file sizes before appending
src_size = get_file_size_in_mb(src_path)
dst_size = get_file_size_in_mb(dst_path)

if dst_size + src_size >= video_files_size_in_mb:
# Rotate to a new file, this source becomes start of new destination
Expand Down
11 changes: 11 additions & 0 deletions src/lerobot/datasets/dataset_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
DEFAULT_DATA_PATH,
DEFAULT_EPISODES_PATH,
get_parquet_file_size_in_mb,
load_episodes,
to_parquet_with_hf_images,
update_chunk_file_indices,
write_info,
Expand Down Expand Up @@ -436,6 +437,9 @@ def _copy_and_reindex_data(
Returns:
dict mapping episode index to its data file metadata (chunk_index, file_index, etc.)
"""
if src_dataset.meta.episodes is None:
src_dataset.meta.episodes = load_episodes(src_dataset.meta.root)

file_to_episodes: dict[Path, set[int]] = {}
for old_idx in episode_mapping:
file_path = src_dataset.meta.get_data_file_path(old_idx)
Expand Down Expand Up @@ -645,6 +649,8 @@ def _copy_and_reindex_videos(
Returns:
dict mapping episode index to its video metadata (chunk_index, file_index, timestamps)
"""
if src_dataset.meta.episodes is None:
src_dataset.meta.episodes = load_episodes(src_dataset.meta.root)

episodes_video_metadata: dict[int, dict] = {new_idx: {} for new_idx in episode_mapping.values()}

Expand Down Expand Up @@ -770,6 +776,9 @@ def _copy_and_reindex_episodes_metadata(
"""
from lerobot.datasets.utils import flatten_dict

if src_dataset.meta.episodes is None:
src_dataset.meta.episodes = load_episodes(src_dataset.meta.root)

all_stats = []
total_frames = 0

Expand Down Expand Up @@ -831,6 +840,8 @@ def _copy_and_reindex_episodes_metadata(

total_frames += src_episode["length"]

dst_meta._close_writer()

dst_meta.info.update(
{
"total_episodes": len(episode_mapping),
Expand Down
Loading