Workaround to enable running PDS-H via WebHDFS #20132
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
For HTTP-based remote files, Polars performs additional checks involving sending a HEAD request to the server. WebHDFS does not accept HEAD requests, causing PDS-H to error out.
This PR provides a workaround to enable PDS-H benchmark via WebHDFS without the need for upstream Polars change.
To enable this feature, users need to specify two new environment variables
LIBCUDF_IO_REROUTE_LOCAL_DIR_PATTERN
andLIBCUDF_IO_REROUTE_REMOTE_DIR_PATTERN
. At runtime any local file path will be modified (only in-memory, not affecting the original file) such that the first occurrence of "local dir pattern" is replaced by "remote dir pattern", and a remote file resource will be used instead of a local file resource.Checklist