Skip to content

Conversation

@rmnskb
Copy link
Contributor

@rmnskb rmnskb commented Oct 30, 2025

Rationale for this change

See #47728. Check source argument in pyarrow.parquet.read_table if pyarrow.dataset is not available.

What changes are included in this PR?

Check the source argument, raise ValueError if the source argument is either a list of .parquet files or a directory.

Are these changes tested?

Yes

Are there any user-facing changes?

No

In case if the source argument is a directory, I decided not to check it directly, but to catch the exceptions coming from the fs.open_input_file, since it already checks for it, and add extra exception on top of the stack that explains the actual reason.

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

if isinstance(source, list):
raise ValueError(
"the 'source' argument cannot be a list of files "
"when the pyarrow.dataset is not available"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"when the pyarrow.dataset is not available"
"when the pyarrow.dataset module is not available"

# TODO test that source is not a directory or a list
try:
source = filesystem.open_input_file(path)
except (OSError, FileNotFoundError) as e:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will filesystem.open_input_file raise OSError and/or FileNotFoundError only when source is not a file and is a dir? Could for example raise OSError when the process doesn't have permission to read the file or others?
If one of those exceptions can raise on other cases, we might be misleading users as we will always raise a ValueError with the description pointing to the dir vs file issue.
We might want to check with something like (I haven't tested just checked at the filesystem API):

filesystem.get_file_info(path).is_file()

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants