fix: raise error when folder-based datasets are loaded without data_dir or data_files #7618
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related Issues/PRs
#6152What changes are proposed in this pull request?
This PR adds an early validation step for folder-based datasets (like
audiofolder
) to prevent silent fallback behavior.Before this fix:
data_dir
ordata_files
were not provided, the loader defaulted to the current working directory.Now:
data_dir
anddata_files
are missing, aValueError
is raised early with a helpful message.How is this PR tested?
load_dataset("audiofolder")
with missingdata_dir
Does this PR require documentation update?
Release Notes
Is this a user-facing change?
What component(s), interfaces, languages, and integrations does this PR affect?
Components:
area/datasets
area/load
How should the PR be classified in the release notes? Choose one:
rn/bug-fix
- A user-facing bug fix worth mentioning in the release notesShould this PR be included in the next patch release?