Skip to content

Conversation

kingcrimsontianyu
Copy link
Contributor

@kingcrimsontianyu kingcrimsontianyu commented Sep 29, 2025

Description

For HTTP-based remote files, Polars performs additional checks involving sending a HEAD request to the server. WebHDFS does not accept HEAD requests, causing PDS-H to error out.

OSError: object-store error: Generic HTTP error: Error performing HEAD http://localhost:9870/webhdfs/v1/home/tialiu/scale-100.0/supplier.parquet in 507.238µs - Server returned non-2xx status code: 400 Bad Request:

This PR provides a workaround to enable PDS-H benchmark via WebHDFS without the need for upstream Polars change.

To enable this feature, users need to specify two new environment variables LIBCUDF_IO_REROUTE_LOCAL_DIR_PATTERN and LIBCUDF_IO_REROUTE_REMOTE_DIR_PATTERN. At runtime any local file path will be modified (only in-memory, not affecting the original file) such that the first occurrence of "local dir pattern" is replaced by "remote dir pattern", and a remote file resource will be used instead of a local file resource.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Sep 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Sep 29, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Sep 29, 2025
@kingcrimsontianyu kingcrimsontianyu added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

1 participant