Add Mask::count() method to count true elements #490
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add
Mask::count()methodMotivation
The
MaskAPI currently provides boolean queries (any(),all()) and index queries (first_set()), but lacks a method to count the number of true elements. This forces users to either convert to arrays and iterate, or manually useto_bitmask().count_ones(), which exposes implementation details.Current workarounds:
Proposed:
This pattern appears frequently in SIMD code when pre-sizing allocations to avoid reallocation overhead:
Other common use cases include histogram generation, SQL-style COUNT aggregation, and sparse data analysis.
API Design
Design decisions:
usize- Consistent withIterator::count()and suitable for array indexingcount()notlen()-len()implies container size;count()matches the semantic operation (counting true values)#[must_use]attribute - FollowsVec::len()andslice::len()precedent (no message)const-to_bitmask()uses intrinsics that cannot be const-evaluatedImplementation
The implementation delegates to
to_bitmask().count_ones(), which already uses LLVM'sllvm.ctpopintrinsic. This compiles to efficient platform-specific instructions:POPCNT(SSE4.2)CNT(NEON)CPOP(Zbb extension)i64.popcntNo platform-specific code is required; LLVM handles optimization for each target.
Performance
Benchmarked on x86_64 (Intel Core i7-14700HX,
-C target-cpu=native):Assembly verification shows the expected codegen (x86_64):
The operation is branch-free and density-independent: mask16 measured at 1.03-1.05ns across all densities (0%, 25%, 50%, 75%, 100%), confirming constant-time behavior regardless of true element count.