-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Open
Labels
bugSomething isn’t working correctlySomething isn’t working correctlydatasetIssues regarding data inputs, processing, or datasetsIssues regarding data inputs, processing, or datasets
Description
System Info
- lerobot version: 0.3.4
- Platform: Linux-6.14.0-24-generic-x86_64-with-glibc2.39
- Python version: 3.10.19
- Huggingface Hub version: 0.35.3
- Datasets version: 4.1.1
- Numpy version: 1.26.4
- PyTorch version: 2.5.1+cu124
- Is PyTorch built with CUDA support?: True
- Cuda version: 12.4
- GPU model: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation EditionInformation
- One of the scripts in the examples/ folder of LeRobot
- My own task or dataset (give details below)
Reproduction
The error happens when you merge dataset A and B, where A has
data/chunk-000/file-000.parquet
data/chunk-000/file-001.parquet
videos/observation.images.left/chunk-000/file-000.mp4
videos/observation.images.left/chunk-000/file-001.mp4
and B has
data/chunk-000/file-000.parquet
videos/observation.images.left/chunk-000/file-000.mp4
This will bug out when using data conversion tools on the merged data and reports:
"/.../convert_dataset_v30_to_v21.py", line 175, in convert_data
raise FileNotFoundError(f"Expected source parquet file not found: {source_path}")
FileNotFoundError: Expected source parquet file not found: .../data/chunk-000/file-001.parquet
Expected behavior
The old aggregate.py will produce a merged dataset that has (as an example, the number of video files does not match the number of data files, and the meta has an incorrect mapping):
data/chunk-000/file-000.parquet
videos/observation.images.left/chunk-000/file-000.mp4
videos/observation.images.left/chunk-000/file-001.mp4
Metadata
Metadata
Assignees
Labels
bugSomething isn’t working correctlySomething isn’t working correctlydatasetIssues regarding data inputs, processing, or datasetsIssues regarding data inputs, processing, or datasets