Skip to content

[SYCL][Graph] async_malloc use allocation size for zeVirtualMemQueryPageSize #19402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: sycl
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions sycl/cmake/modules/FetchUnifiedRuntime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ set(UR_BUILD_TESTS "${SYCL_UR_BUILD_TESTS}" CACHE BOOL "" FORCE)
# UR tests require the examples to be built
set(UR_BUILD_EXAMPLES "${SYCL_UR_BUILD_TESTS}" CACHE BOOL "" FORCE)

option(SYCL_UR_FORMAT_CPP_STYLE "Format code style of UR C++ sources" OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Can't we just pass UR_FORMAT_CPP_STYLE when configuring SYCL?

set(UR_FORMAT_CPP_STYLE "${SYCL_UR_FORMAT_CPP_STYLE}" CACHE BOOL "" FORCE)

# Here we override the defaults to unified-runtime
set(UR_BUILD_XPTI_LIBS OFF CACHE BOOL "")
set(UR_ENABLE_SYMBOLIZER ON CACHE BOOL "Enable symbolizer for sanitizer layer.")
Expand Down
5 changes: 5 additions & 0 deletions sycl/include/sycl/ext/oneapi/virtual_mem/virtual_mem.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,11 @@ __SYCL_EXPORT size_t
get_mem_granularity(const device &SyclDevice, const context &SyclContext,
granularity_mode Mode = granularity_mode::recommended);

__SYCL_EXPORT size_t get_mem_granularity(const device &SyclDevice,
const context &SyclContext,
granularity_mode Mode,
size_t allocationSize);

__SYCL_EXPORT size_t
get_mem_granularity(const context &SyclContext,
granularity_mode Mode = granularity_mode::recommended);
Expand Down
3 changes: 2 additions & 1 deletion sycl/source/detail/graph/memory_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ void *graph_mem_pool::malloc(size_t Size, usm::alloc AllocType,
context_impl &CtxImpl = *getSyclObjImpl(MContext);
adapter_impl &Adapter = CtxImpl.getAdapter();

size_t Granularity = get_mem_granularity(MDevice, MContext);
const size_t Granularity = get_mem_granularity(
MDevice, MContext, granularity_mode::recommended, Size);
uintptr_t StartPtr = 0;
size_t AlignedSize = alignByteSize(Size, Granularity);
// See if we can find an allocation to reuse
Expand Down
15 changes: 12 additions & 3 deletions sycl/source/virtual_mem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ namespace ext::oneapi::experimental {

__SYCL_EXPORT size_t get_mem_granularity(const device &SyclDevice,
const context &SyclContext,
granularity_mode Mode) {
granularity_mode Mode,
size_t allocationSize) {
Comment on lines +27 to +28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an ABI-breaking change, we can't just do it like that.

Copy link
Contributor Author

@lslusarczyk lslusarczyk Jul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out. How shall I make this change? I guess there is some way of changing ABI.
I see: sycl/doc/developer/ABIPolicyGuide.md

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is part of an experimental extension https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_virtual_mem.asciidoc Acording to https://github.com/intel/llvm/blob/sycl/sycl/doc/developer/ABIPolicyGuide.md Features clearly marked as experimental are considered as an exception to this guideline.

I understand although ABI change is possible I should get consensus on API change with virtual_mem extension owners. @steffenlarsen , @aelovikov-intel , @aarongreig - you are code contributors for virtual mem, should I discuss it with you, some of you, some other people too? I'd like to start email thread with interested people about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case it would be as simple as using two overloads instead of default value for the allocationSize. Was previous behavior the same as = 1 case of the new impl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Previous behavior is the same as new one with =1.
Ok. I'm adding an additional overload. Thank you for suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Please check if the signature looks OK. If so I'll update extension documentation.

if (!SyclDevice.has(aspect::ext_oneapi_virtual_mem))
throw sycl::exception(
sycl::make_error_code(sycl::errc::feature_not_supported),
Expand All @@ -45,20 +46,28 @@ __SYCL_EXPORT size_t get_mem_granularity(const device &SyclDevice,
#ifndef NDEBUG
size_t InfoOutputSize = 0;
Adapter->call<sycl::detail::UrApiKind::urVirtualMemGranularityGetInfo>(
urCtx, urDevice, GranularityQuery, 0, nullptr, &InfoOutputSize);
urCtx, urDevice, allocationSize, GranularityQuery, 0, nullptr,
&InfoOutputSize);
assert(InfoOutputSize == sizeof(size_t) &&
"Unexpected output size of granularity info query.");
#endif // NDEBUG
size_t Granularity = 0;
Adapter->call<sycl::detail::UrApiKind::urVirtualMemGranularityGetInfo>(
urCtx, urDevice, GranularityQuery, sizeof(size_t), &Granularity, nullptr);
urCtx, urDevice, allocationSize, GranularityQuery, sizeof(size_t),
&Granularity, nullptr);
if (Granularity == 0)
throw sycl::exception(
sycl::make_error_code(sycl::errc::invalid),
"Unexpected granularity result: memory granularity shouldn't be 0.");
return Granularity;
}

__SYCL_EXPORT size_t get_mem_granularity(const device &SyclDevice,
const context &SyclContext,
granularity_mode Mode) {
return get_mem_granularity(SyclDevice, SyclContext, Mode, 1);
}

__SYCL_EXPORT size_t get_mem_granularity(const context &SyclContext,
granularity_mode Mode) {
const std::vector<device> Devices = SyclContext.get_devices();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
// Extra run to check for immediate-command-list in Level Zero
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}

#define GRAPH_E2E_EXPLICIT

#include "../Inputs/async_alloc_different_sizes.cpp"
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
// Tests async allocations with different sizes.

#include "../../graph_common.hpp"
#include <sycl/ext/oneapi/experimental/async_alloc/async_alloc.hpp>

void asyncAllocWorksWithSize(size_t Size) {
queue Queue{};

exp_ext::command_graph Graph{Queue.get_context(), Queue.get_device()};

void *AsyncPtr = nullptr;
// Add alloc node
auto AllocNode = add_node(Graph, Queue, [&](handler &CGH) {
AsyncPtr = exp_ext::async_malloc(CGH, usm::alloc::device, Size);
});

add_node(
Graph, Queue,
[&](handler &CGH) {
depends_on_helper(CGH, AllocNode);
exp_ext::async_free(CGH, AsyncPtr);
},
AllocNode);

auto GraphExec = Graph.finalize();
}

int main() {
asyncAllocWorksWithSize(1);
asyncAllocWorksWithSize(131);
asyncAllocWorksWithSize(10071);
asyncAllocWorksWithSize(1007177);
asyncAllocWorksWithSize(191439360);
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out
// Extra run to check for leaks in Level Zero using UR_L0_LEAKS_DEBUG
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=0 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}
// Extra run to check for immediate-command-list in Level Zero
// RUN: %if level_zero %{env SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 %{l0_leak_check} %{run} %t.out 2>&1 | FileCheck %s --implicit-check-not=LEAK %}

#define GRAPH_E2E_RECORD_REPLAY

#include "../Inputs/async_alloc_different_sizes.cpp"
3 changes: 3 additions & 0 deletions unified-runtime/include/ur_api.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions unified-runtime/include/ur_ddi.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions unified-runtime/include/ur_print.hpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions unified-runtime/scripts/core/virtual_memory.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ params:
[in][optional] is the device to get the granularity from, if the
device is null then the granularity is suitable for all devices in
context.
- type: size_t
name: allocationSize
desc: "[in] size in bytes of allocation size which granurality we search for."
- type: $x_virtual_mem_granularity_info_t
name: propName
desc: "[in] type of the info to query."
Expand Down
1 change: 1 addition & 0 deletions unified-runtime/source/adapters/cuda/virtual_mem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

UR_APIEXPORT ur_result_t UR_APICALL urVirtualMemGranularityGetInfo(
ur_context_handle_t, ur_device_handle_t hDevice,
[[maybe_unused]] size_t allocationSize,
ur_virtual_mem_granularity_info_t propName, size_t propSize,
void *pPropValue, size_t *pPropSizeRet) {
UrReturnHelper ReturnValue(propSize, pPropValue, pPropSizeRet);
Expand Down
4 changes: 2 additions & 2 deletions unified-runtime/source/adapters/hip/virtual_mem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
#include "physical_mem.hpp"

UR_APIEXPORT ur_result_t UR_APICALL urVirtualMemGranularityGetInfo(
ur_context_handle_t, ur_device_handle_t, ur_virtual_mem_granularity_info_t,
size_t, void *, size_t *) {
ur_context_handle_t, ur_device_handle_t, size_t,
ur_virtual_mem_granularity_info_t, size_t, void *, size_t *) {
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
}

Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 4 additions & 3 deletions unified-runtime/source/adapters/level_zero/virtual_mem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,8 @@ namespace ur::level_zero {

ur_result_t urVirtualMemGranularityGetInfo(
ur_context_handle_t hContext, ur_device_handle_t hDevice,
ur_virtual_mem_granularity_info_t propName, size_t propSize,
void *pPropValue, size_t *pPropSizeRet) {
size_t allocationSize, ur_virtual_mem_granularity_info_t propName,
size_t propSize, void *pPropValue, size_t *pPropSizeRet) {
UrReturnHelper ReturnValue(propSize, pPropValue, pPropSizeRet);
switch (propName) {
case UR_VIRTUAL_MEM_GRANULARITY_INFO_MINIMUM:
Expand All @@ -34,7 +34,8 @@ ur_result_t urVirtualMemGranularityGetInfo(
// aligned size.
size_t PageSize;
ZE2UR_CALL(zeVirtualMemQueryPageSize,
(hContext->getZeHandle(), hDevice->ZeDevice, 1, &PageSize));
(hContext->getZeHandle(), hDevice->ZeDevice, allocationSize,
&PageSize));
return ReturnValue(PageSize);
}
default:
Expand Down
5 changes: 4 additions & 1 deletion unified-runtime/source/adapters/mock/ur_mockddi.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions unified-runtime/source/adapters/native_cpu/virtual_mem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
#include "physical_mem.hpp"

UR_APIEXPORT ur_result_t UR_APICALL urVirtualMemGranularityGetInfo(
ur_context_handle_t, ur_device_handle_t, ur_virtual_mem_granularity_info_t,
size_t, void *, size_t *) {
ur_context_handle_t, ur_device_handle_t, size_t,
ur_virtual_mem_granularity_info_t, size_t, void *, size_t *) {
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
}

Expand Down
4 changes: 2 additions & 2 deletions unified-runtime/source/adapters/opencl/virtual_mem.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@
#include "physical_mem.hpp"

UR_APIEXPORT ur_result_t UR_APICALL urVirtualMemGranularityGetInfo(
ur_context_handle_t, ur_device_handle_t, ur_virtual_mem_granularity_info_t,
size_t, void *, size_t *) {
ur_context_handle_t, ur_device_handle_t, size_t,
ur_virtual_mem_granularity_info_t, size_t, void *, size_t *) {
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -272,10 +272,13 @@ size_t GetKernelPrivateMemorySize(ur_kernel_handle_t Kernel,
size_t GetVirtualMemGranularity(ur_context_handle_t Context,
ur_device_handle_t Device) {
size_t Size;
const size_t allocationSize =
1; // probably we want to use actual allocation size
[[maybe_unused]] auto Result =
getContext()->urDdiTable.VirtualMem.pfnGranularityGetInfo(
Context, Device, UR_VIRTUAL_MEM_GRANULARITY_INFO_RECOMMENDED,
sizeof(Size), &Size, nullptr);
Context, Device, allocationSize,
UR_VIRTUAL_MEM_GRANULARITY_INFO_RECOMMENDED, sizeof(Size), &Size,
nullptr);
assert(Result == UR_RESULT_SUCCESS);
return Size;
}
Expand Down
10 changes: 7 additions & 3 deletions unified-runtime/source/loader/layers/tracing/ur_trcddi.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 4 additions & 2 deletions unified-runtime/source/loader/ur_ldrddi.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 4 additions & 2 deletions unified-runtime/source/loader/ur_libapi.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions unified-runtime/source/ur_api.cpp

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading