Skip to content

Conversation

@ErdaradunGaztea
Copy link
Contributor

Fixes #1036

This PR reimplements every(), some(), and none() in C. I followed the C implementation of map() (since I have no experience writing in C). The interface does not change at all, the behaviour should also remain intact.

The key change is I did not use as_predicate(), but as_mapper() instead. Now, the difference between these two functions is that the former performs a Bool check on its output. This was computationally expensive and even replacing this check with .Call() didn't help, since the code was switching between C and R contexts a lot. My final solution was to perform these checks in C, in the same code that performs the predicate-checking loop.

I had to replace the implementation of none() with a separate C solution, since negate() had a huge overhead. However, all three functions share almost all of their C implementations now.

Finally, the performance. They should be equal now (except that every() and friends still have their early return).

library(purrr)

x <- as.list(1:10000)

fn <- function(x) {
  vctrs::vec_is(x) || is.null(x)
}

# Three basic benchmarks
bench::mark(
  all(map_lgl(x, fn)),
  every(x, fn),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression               min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>          <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 all(map_lgl(x, fn))   15.8ms   16.3ms      59.0    4.15MB     44.3
#> 2 every(x, fn)          15.4ms   16.2ms      59.5   12.16KB     46.5

bench::mark(
  any(map_lgl(x, vctrs::vec_is_list)),
  some(x, vctrs::vec_is_list),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                             min median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                          <bch:> <bch:>     <dbl> <bch:byt>    <dbl>
#> 1 any(map_lgl(x, vctrs::vec_is_list)) 6.94ms  7.2ms      135.    43.7KB     29.6
#> 2 some(x, vctrs::vec_is_list)         6.79ms 7.09ms      138.     9.7KB     27.5

bench::mark(
  !any(map_lgl(x, is.null)),
  none(x, is.null),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                     min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 !any(map_lgl(x, is.null))   4.78ms   4.99ms      197.   228.5KB     33.2
#> 2 none(x, is.null)             4.8ms   4.96ms      194.     9.7KB     33.2

# `negate()` has a lot of overhead
bench::mark(
  all(map_lgl(x, negate(is.null))),
  every(x, negate(is.null)),
  min_time = 1
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                       <bch:tm> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 all(map_lgl(x, negate(is.null)))    112ms   117ms      8.55    3.26MB     32.3
#> 2 every(x, negate(is.null))           114ms   124ms      7.84    2.67MB     24.5

# An early stop example
bench::mark(
  any(map_lgl(x, is.integer)),
  some(x, is.integer),
  min_time = 1
)
#> # A tibble: 2 × 6
#>   expression                       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 any(map_lgl(x, is.integer))   4.87ms   5.26ms      176.      42KB     31.1
#> 2 some(x, is.integer)           53.3µs   61.4µs    14756.        0B     16.2

Created on 2025-02-06 with reprex v2.1.1

@hadley hadley requested a review from DavisVaughan September 2, 2025 22:06
@hadley
Copy link
Member

hadley commented Sep 2, 2025

@DavisVaughan can you please take a look? It's been a while since I've touched C.

@hadley hadley mentioned this pull request Sep 25, 2025
25 tasks
Comment on lines -78 to -81
if (is_na(out) && .allow_na) {
# Always return a logical NA
return(NA)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this because no one else uses it

This was a fairly weird bit of purrr. every(), none(), and some() strictly required the result of .p to be TRUE or FALSE with no casting, but were lax about NA, allowing NA_character_ and friends, which I don't really think made any sense.

I've changed this in the C code to strictly require a scalar logical vector, and added tests about this.

This is a breaking change, but hopefully a very minor one.

lifecycle::deprecate_soft(
when = "1.0.0",
what = I("Use of calls and pairlists in map functions"),
what = I("Use of calls and expressions in purrr functions"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe pairlists was a typo here

We now use vctrs_vec_compat() in every(), some(), and none() for consistency with map() and friends. That requires tweaking this message a little bit, but I think it is fine and unlikely to be seen by many people anyways.

})

test_that("pairlists, expressions, and calls are deprecated", {
local_options(lifecycle_verbosity = "warning")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure it warns every time it is called, not just the first time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of new tests to check all of the edge cases, our assumptions around what .p can return, and what we allow as input for .x.

@DavisVaughan
Copy link
Member

Thanks @ErdaradunGaztea, we will take it from here!

@hadley
Copy link
Member

hadley commented Sep 30, 2025

@DavisVaughan revdeps look good to me from a quick glance

Copy link
Member

@lionel- lionel- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG!

@DavisVaughan DavisVaughan changed the title Optimize every() and related functions Rewrite every(), some(), and none() in C Oct 2, 2025
@DavisVaughan DavisVaughan merged commit dca5b5a into tidyverse:main Oct 2, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance of every(), some(), and none()

4 participants