-
Notifications
You must be signed in to change notification settings - Fork 110
Description
Is your feature request related to a problem? Please describe.
RAJA plugins are used by CHAI to make sure the data backing ManagedArray is in the correct memory space and that it is up to date. However, the approach used now is not stream aware. This leads to suboptimal performance on GPU platforms. Where there is a dual memory space (CUDA), memory copies to the host are done on stream 0, which forces the whole device to synchronize. Where there is a single memory space (HIP), we have to do a synchronize across the whole device to make sure the data is valid during host accesses.
Describe the solution you'd like
Making CHAI stream aware would be relatively straightforward if the camp resource used by RAJA was passed as an argument to the plugin functions. Additionally, the postLaunch function should also receive an event with a wait method that CHAI can call when it needs to be sure the kernel has been completed.
Describe alternatives you've considered
Instead of modifying the plugin, RAJA could set some global state that is accessible when the plugin methods are called.
Additional context
Umpire is working on camp resource aware allocators (llnl/Umpire#901), which CHAI will also be using.
Also, note that even if only one stream is being used in an application, this new approach will be more efficient than synchronizing across the whole device.