Skip to content

Conversation

@RobertLeahy
Copy link
Contributor

@RobertLeahy RobertLeahy commented Dec 16, 2025

Previously the operation states for let_value, let_error, and let_stopped directly contained the operation states of one or two child operations. When first connected this number was one, and after the completion of that first operation and the connection of the sender returned by the wrapped invocable the number was two. Notably when the second operation state is connected the first operation has completed.

P3373 proposes changing the above behavior such that after the tuple of values sent by the first operation is populated the operation state for the first operation is destroyed. Thereafter that storage can be reused for the operation state of the second operation thereby:

  • Release resources held by the first operation state sooner, and
  • Reducing the size of the operation states of let_value, let_error, and let_stopped

This commit implements the above-described change.

…mpletion

Previously the operation states for let_value, let_error, and
let_stopped directly contained the operation states of one or two child
operations. When first connected this number was one, and after the
completion of that first operation and the connection of the sender
returned by the wrapped invocable the number was two. Notably when the
second operation state is connected the first operation has completed.

P3373 proposes changing the above behavior such that after the tuple of
values sent by the first operation is populated the operation state for
the first operation is destroyed. Thereafter that storage can be reused
for the operation state of the second operation thereby:

- Release resources held by the first operation state sooner, and
- Reducing the size of the operation states of let_value, let_error, and
  let_stopped

This commit implements the above-described change.
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment on lines +306 to +325
const auto __impl = [&]() noexcept(__nothrow) {
auto& __tuple =
__state.__args_.emplace_from(__tup::__mktuple, static_cast<_Args&&>(__args)...);
auto&& __sender = __tuple.apply(static_cast<_Fun&&>(__state.__fun_), __tuple);
__state.__storage_.template emplace<__monostate>();
__second_rcvr_t __r{__rcvr, static_cast<__env2_t&&>(__state.__env2_)};
auto& __op = __state.__storage_.template emplace<__submit_t>(
static_cast<__sender_t&&>(__sender), static_cast<__second_rcvr_t&&>(__r));
__op.submit(static_cast<__sender_t&&>(__sender), static_cast<__second_rcvr_t&&>(__r));
};
if constexpr (__nothrow) {
__impl();
} else {
STDEXEC_TRY {
__impl();
}
STDEXEC_CATCH_ALL {
::stdexec::set_error(static_cast<_Receiver&&>(__rcvr), std::current_exception());
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i learned from lewis that it is simpler to use try unconditionally, and only guard the set_error call in the catch block. the optimized codegen is identical.

Suggested change
const auto __impl = [&]() noexcept(__nothrow) {
auto& __tuple =
__state.__args_.emplace_from(__tup::__mktuple, static_cast<_Args&&>(__args)...);
auto&& __sender = __tuple.apply(static_cast<_Fun&&>(__state.__fun_), __tuple);
__state.__storage_.template emplace<__monostate>();
__second_rcvr_t __r{__rcvr, static_cast<__env2_t&&>(__state.__env2_)};
auto& __op = __state.__storage_.template emplace<__submit_t>(
static_cast<__sender_t&&>(__sender), static_cast<__second_rcvr_t&&>(__r));
__op.submit(static_cast<__sender_t&&>(__sender), static_cast<__second_rcvr_t&&>(__r));
};
if constexpr (__nothrow) {
__impl();
} else {
STDEXEC_TRY {
__impl();
}
STDEXEC_CATCH_ALL {
::stdexec::set_error(static_cast<_Receiver&&>(__rcvr), std::current_exception());
}
}
STDEXEC_TRY {
auto& __tuple =
__state.__args_.emplace_from(__tup::__mktuple, static_cast<_Args&&>(__args)...);
auto&& __sender = __tuple.apply(static_cast<_Fun&&>(__state.__fun_), __tuple);
__state.__storage_.template emplace<__monostate>();
__second_rcvr_t __r{__rcvr, static_cast<__env2_t&&>(__state.__env2_)};
auto& __op = __state.__storage_.template emplace<__submit_t>(
static_cast<__sender_t&&>(__sender), static_cast<__second_rcvr_t&&>(__r));
__op.submit(static_cast<__sender_t&&>(__sender), static_cast<__second_rcvr_t&&>(__r));
}
STDEXEC_CATCH_ALL {
if constexpr (!__nothrow) {
::stdexec::set_error(static_cast<_Receiver&&>(__rcvr), std::current_exception());
}
}

Comment on lines +573 to +576
template <typename _Sender>
using __sender_of = decltype((std::declval<__data_of<_Sender>>().__sndr));
template <typename _Sender>
using __fun_of = decltype((std::declval<__data_of<_Sender>>().__fun));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

microbenchmarks showed that this somewhat bizarre implementation of declval compiles about 2x faster than the standard one:

  template <class _Tp, bool _Noexcept = true>
  using __declfn_t = auto (*)() noexcept(_Noexcept) -> _Tp;

  template <class _Tp>
  extern __declfn_t<_Tp &&> __declval;
Suggested change
template <typename _Sender>
using __sender_of = decltype((std::declval<__data_of<_Sender>>().__sndr));
template <typename _Sender>
using __fun_of = decltype((std::declval<__data_of<_Sender>>().__fun));
template <typename _Sender>
using __sender_of = decltype((__declval<__data_of<_Sender>>().__sndr));
template <typename _Sender>
using __fun_of = decltype((__declval<__data_of<_Sender>>().__fun));

Comment on lines 513 to +586
auto operator()(_Sender&& __sndr, _Fun __fun) const -> __well_formed_sender auto {
return __make_sexpr<__let_t<_SetTag>>(
static_cast<_Fun&&>(__fun), static_cast<_Sender&&>(__sndr));
__data_t{static_cast<_Sender&&>(__sndr), static_cast<_Fun&&>(__fun)});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a problem. the structured binding of let_value is part of its interface. this should compile:

auto [tag, fn_copy, sndr_copy] = stdexec::let_value(sndr, fn);

you can use transform_sender to swivel the fn and the sndr into a __data_t object. exec::repeat_n does this, so you can crib from that.

return __sndr.apply(
static_cast<_Sender&&>(__sndr),
[&]<class _Fn, class _Child>(__ignore, _Fn&& __fn, _Child&& __child) {
// TODO(ericniebler): this needs a fallback
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't loose this comment. i need to come back to this.

@ericniebler
Copy link
Collaborator

/ok to test 9f66f97

@ericniebler
Copy link
Collaborator

/ok to test db1b0bb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants