Remove unnecessary synchronization (miss-sync) during Parquet reading (Part 4: vector_factories) #20120

JigaoLuo · 2025-09-26T10:58:28Z

Description

For issue #18967, this PR is the first part of merging the PR Draft #18968. In this PR, I added host-pinned vector construction in vector_factories.hpp. After a careful read-through, I’ve improved the comments in this file as well.
(As discussed, I’ve also made manual changes to reduction.cuh and page_data.cu.)

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

Signed-off-by: Jigao Luo <[email protected]>

This reverts commit c7ad2e8.

Signed-off-by: Jigao Luo <[email protected]>

…c-pinned-factory

copy-pr-bot · 2025-09-26T10:58:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

JigaoLuo · 2025-09-26T11:01:05Z

I’ve marked this as a draft to remind myself to run the script and count how many pageable copies this PR eliminates before merging.

JigaoLuo · 2025-09-27T18:21:08Z

cpp/include/cudf/detail/utilities/vector_factories.hpp

This links back to the draft PR for reference:
https://github.com/rapidsai/cudf/pull/18968/files#diff-b281f280563cbbee7c16afb29ef989d808476a355c9c36a8f4e27fc5dc2ca4fd

JigaoLuo · 2025-09-27T18:26:40Z

cpp/include/cudf/reduction/detail/reduction.cuh

This links back to the draft PR for reference, but covering the full change in reduction.cuh:

https://github.com/rapidsai/cudf/pull/18968/files#diff-d99740825ef0d2e73c3e8392d06ca11b229400d864913b4221f3f3626ad95f85

JigaoLuo · 2025-09-27T18:29:22Z

cpp/include/cudf/reduction/detail/reduction.cuh

+  auto pinned_initial = cudf::detail::make_pinned_vector_async<OutputType>(1, stream);
+  pinned_initial[0] = initial_value;
  using ScalarType         = cudf::scalar_type_t<OutputType>;
-  auto result              = std::make_unique<ScalarType>(initial_value, true, stream, mr);
+  auto result = std::make_unique<ScalarType>(pinned_initial[0], true, stream, mr);


As we discussed on Slack: assign initial_value to element zero of a pinned vector, effectively treating it like a pinned scalar.

I forgot most of the context here :(
are we passing the value by reference here?

No, we are not passing by reference here.

Let me bring back some context from our Slack chat. The goal is for ScalarType and cub::DeviceReduce::Reduce to copy the initial_value from host-pinned memory.

Back around August 19th in Slack, we discussed:

placing the initial_value in a pinned host vector of size 1

and then assigning the value to the first element [0].

cpp/src/io/parquet/page_data.cu

cpp/include/cudf/detail/utilities/vector_factories.hpp

vuule · 2025-09-30T22:28:00Z

cpp/include/cudf/reduction/detail/reduction.cuh

+  auto pinned_initial = cudf::detail::make_pinned_vector_async<OutputType>(1, stream);
+  pinned_initial[0] = initial_value;
  using ScalarType         = cudf::scalar_type_t<OutputType>;
-  auto result              = std::make_unique<ScalarType>(initial_value, true, stream, mr);
+  auto result = std::make_unique<ScalarType>(pinned_initial[0], true, stream, mr);


I forgot most of the context here :(
are we passing the value by reference here?

cpp/src/io/parquet/page_data.cu

…o no-miss-sync-pinned-factory

Signed-off-by: Jigao Luo <[email protected]>

JigaoLuo · 2025-10-03T13:44:05Z

cpp/src/io/parquet/reader_impl.cpp

Change the type of offsets and buff_addrs to cudf::detail::host_vector for calling write_final_offsets function. This is the only place where the function is called.

So no need to change the write_final_offsets function in cpp/src/io/parquet/page_data.cu

Co-authored-by: Vukasin Milovanovic <[email protected]>

Signed-off-by: Jigao Luo <[email protected]>

cpp/include/cudf/reduction/detail/reduction.cuh

vuule · 2025-10-03T22:40:20Z

cpp/include/cudf/reduction/detail/reduction.cuh

+  auto pinned_initial = cudf::detail::make_pinned_vector_async<OutputType>(1, stream);
+  pinned_initial[0] = initial_value;


I don't think we need the pinned vector here, since cudf::detail::device_scalar will use the bounce buffer for the H2D copy anyway.

That’s true—I’ll revert it. But I have one question: does cub::DeviceReduce::Reduce actually copy the initial_value from host memory?

To investigate this question, I ran experiments using both a pinned host vector and a regular one.

$ nsys profile ./REDUCTIONS_TEST $ nsys export --output report1.sqlite --type sqlite report1.nsys-rep $ nsys analyze -r cuda_memcpy_async:rows=-1 report1.nsys-rep | wc -l

I didn’t observe any difference in pageable-copy counter, which suggests that CUB avoids pageable memory internally.

What makes this confusing is that I recall doing a similar experiment a few months ago to pinpoint a pageable memory bottleneck. I’m fairly sure I found one and managed to eliminate it back then.

(I also tried reading through the CUB source code, but it gets pretty hard to follow after the dispatch logic and various specialization paths)

Signed-off-by: Jigao Luo <[email protected]>

JigaoLuo · 2025-10-04T16:48:56Z

cpp/src/io/parquet/reader_impl.cpp

+  auto out_buffers   = cudf::detail::make_host_vector<size_type*>(0, _stream);
+  auto final_offsets = cudf::detail::make_host_vector<size_type>(0, _stream);


Note: cudf::detail::host_vector should work well as thrust::host_vector when using the cudf/RMM memory allocator.

The reason I raise this is that most existing uses of host_vector in cudf treat it as a fixed-size array. In contrast, this particular case starts with zero-sized and relies on dynamic resizing

JigaoLuo added 11 commits September 2, 2025 20:32

make comment consistent

4c860ec

Signed-off-by: Jigao Luo <[email protected]>

use cudf::detail::host_vector in comments

28bb730

Signed-off-by: Jigao Luo <[email protected]>

first draft of vector_factories (no H2H copy so far)

2febabb

Signed-off-by: Jigao Luo <[email protected]>

cache changes

c7ad2e8

Signed-off-by: Jigao Luo <[email protected]>

Merge branch 'branch-25.10' into no-miss-sync-pinned-factory

64f98b5

Revert "cache changes"

2a1e294

This reverts commit c7ad2e8.

address the only one H2H copy

4ccf8d9

Signed-off-by: Jigao Luo <[email protected]>

update comment during reading

b2c4e0c

Signed-off-by: Jigao Luo <[email protected]>

remove a H2H util

81acfd5

Signed-off-by: Jigao Luo <[email protected]>

reduce initial_valued pinned

c2feb39

Signed-off-by: Jigao Luo <[email protected]>

Merge remote-tracking branch 'upstream/branch-25.12' into no-miss-syn…

ac5a34e

…c-pinned-factory

JigaoLuo requested a review from a team as a code owner September 26, 2025 10:58

JigaoLuo requested review from mhaseeb123 and nvdbaranec September 26, 2025 10:58

github-actions bot assigned JigaoLuo Sep 26, 2025

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 26, 2025

JigaoLuo marked this pull request as draft September 26, 2025 11:00

Merge branch 'branch-25.12' into no-miss-sync-pinned-factory

045e9aa

JigaoLuo commented Sep 27, 2025

View reviewed changes

cpp/src/io/parquet/page_data.cu Show resolved Hide resolved

JigaoLuo commented Sep 27, 2025

View reviewed changes

cpp/src/io/parquet/page_data.cu Outdated Show resolved Hide resolved

vuule reviewed Sep 30, 2025

View reviewed changes

JigaoLuo added 2 commits October 3, 2025 14:57

Merge remote-tracking branch 'public/no-miss-sync-pinned-factory' int…

852e64e

…o no-miss-sync-pinned-factory

change the type of the owner of offsets and buff_addrs

4c8591b

Signed-off-by: Jigao Luo <[email protected]>

JigaoLuo force-pushed the no-miss-sync-pinned-factory branch from 1f8216e to 4c8591b Compare October 3, 2025 13:40

JigaoLuo commented Oct 3, 2025

View reviewed changes

JigaoLuo and others added 3 commits October 3, 2025 15:58

Update cpp/include/cudf/detail/utilities/vector_factories.hpp

09ed07e

Co-authored-by: Vukasin Milovanovic <[email protected]>

fix one place with async

4b1b0b5

Signed-off-by: Jigao Luo <[email protected]>

precommit update

1bb499f

Signed-off-by: Jigao Luo <[email protected]>

JigaoLuo force-pushed the no-miss-sync-pinned-factory branch from 9cb8f6f to 1bb499f Compare October 3, 2025 19:06

JigaoLuo commented Oct 3, 2025

View reviewed changes

cpp/include/cudf/reduction/detail/reduction.cuh Outdated Show resolved Hide resolved

vuule reviewed Oct 3, 2025

View reviewed changes

JigaoLuo added 3 commits October 4, 2025 11:56

use sync verion of make_pinned_vector

9c537b1

Signed-off-by: Jigao Luo <[email protected]>

Merge branch 'branch-25.12' into no-miss-sync-pinned-factory

7d462c5

fixed a simple bug on the vector size logic

b739be1

Signed-off-by: Jigao Luo <[email protected]>

JigaoLuo commented Oct 4, 2025

View reviewed changes

		auto pinned_initial = cudf::detail::make_pinned_vector_async<OutputType>(1, stream);
		pinned_initial[0] = initial_value;

		auto out_buffers = cudf::detail::make_host_vector<size_type*>(0, _stream);
		auto final_offsets = cudf::detail::make_host_vector<size_type>(0, _stream);

Remove unnecessary synchronization (miss-sync) during Parquet reading (Part 4: vector_factories) #20120

Are you sure you want to change the base?

Remove unnecessary synchronization (miss-sync) during Parquet reading (Part 4: vector_factories) #20120

Uh oh!

Conversation

JigaoLuo commented Sep 26, 2025

Description

Checklist

Uh oh!

copy-pr-bot bot commented Sep 26, 2025

Uh oh!

JigaoLuo commented Sep 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JigaoLuo Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JigaoLuo Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JigaoLuo Oct 3, 2025 •

edited

Loading

JigaoLuo Oct 4, 2025 •

edited

Loading