Skip to content

bug: aggregate data meta mapping misalignment #2264

@bingogome

Description

@bingogome

System Info

- lerobot version: 0.3.4
- Platform: Linux-6.14.0-24-generic-x86_64-with-glibc2.39
- Python version: 3.10.19
- Huggingface Hub version: 0.35.3
- Datasets version: 4.1.1
- Numpy version: 1.26.4
- PyTorch version: 2.5.1+cu124
- Is PyTorch built with CUDA support?: True
- Cuda version: 12.4
- GPU model: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition

Information

  • One of the scripts in the examples/ folder of LeRobot
  • My own task or dataset (give details below)

Reproduction

The error happens when you merge dataset A and B, where A has

data/chunk-000/file-000.parquet
data/chunk-000/file-001.parquet
videos/observation.images.left/chunk-000/file-000.mp4
videos/observation.images.left/chunk-000/file-001.mp4

and B has

data/chunk-000/file-000.parquet
videos/observation.images.left/chunk-000/file-000.mp4

This will bug out when using data conversion tools on the merged data and reports:

"/.../convert_dataset_v30_to_v21.py", line 175, in convert_data
raise FileNotFoundError(f"Expected source parquet file not found: {source_path}")
FileNotFoundError: Expected source parquet file not found: .../data/chunk-000/file-001.parquet

Expected behavior

The old aggregate.py will produce a merged dataset that has (as an example, the number of video files does not match the number of data files, and the meta has an incorrect mapping):

data/chunk-000/file-000.parquet
videos/observation.images.left/chunk-000/file-000.mp4
videos/observation.images.left/chunk-000/file-001.mp4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn’t working correctlydatasetIssues regarding data inputs, processing, or datasets

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions