Skip to content

[SYCL][NFC] Mark external libc functions. #19368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: sycl
Choose a base branch
from

Conversation

hvdijk
Copy link
Contributor

@hvdijk hvdijk commented Jul 9, 2025

<sycl/detail/defines_elementary.hpp> defines a __DPCPP_SYCL_EXTERNAL macro for functions that are defined elsewhere. This macro is used both for functions defined by the platform's system libraries and for functions defined in libclc/libdevice. For Native CPU, these should be treated differently: functions defined by the platform's system libraries should be processed according to the platform's ABI, functions defined by libclc/libdevice should be processed in a host-target-independent way. Therefore, this PR introduces a __DPCPP_SYCL_EXTERNAL_LIBC macro so that libc functions can be tagged as such.

This PR does not yet make the changes to Native CPU to define __DPCPP_SYCL_EXTERNAL and __DPCPP_SYCL_EXTERNAL_LIBC differently.

<sycl/detail/defines_elementary.hpp> defines a __DPCPP_SYCL_EXTERNAL
macro for functions that are defined elsewhere. This macro is used both
for functions defined by the platform's system libraries and for
functions defined in libclc/libdevice. For Native CPU, these should be
treated differently: functions defined by the platform's system
libraries should be processed according to the platform's ABI, functions
defined by libclc/libdevice should be processed in a target-independent
way. Therefore, this PR introduces a __DPCPP_SYCL_EXTERNAL_LIBC macro so
that libc functions can be tagged as such.

This PR does not yet make the changes to Native CPU to define
__DPCPP_SYCL_EXTERNAL and __DPCPP_SYCL_EXTERNAL_LIBC differently.
@hvdijk hvdijk requested a review from a team as a code owner July 9, 2025 14:52
@hvdijk hvdijk requested a review from aelovikov-intel July 9, 2025 14:52
@hvdijk hvdijk temporarily deployed to WindowsCILock July 9, 2025 14:53 — with GitHub Actions Inactive
@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 9, 2025

The naming is not ideal, I'm happy to take alternative suggestions for __DPCPP_SYCL_EXTERNAL_LIBC.

@aelovikov-intel
Copy link
Contributor

@bader , @AlexeySachkov , any input from you?

@hvdijk hvdijk temporarily deployed to WindowsCILock July 9, 2025 16:38 — with GitHub Actions Inactive
@hvdijk hvdijk temporarily deployed to WindowsCILock July 9, 2025 16:38 — with GitHub Actions Inactive
@bader
Copy link
Contributor

bader commented Jul 9, 2025

@bader , @AlexeySachkov , any input from you?

I know almost nothing about Native CPU, so I'm unsure the input I can provide is valuable. From the description alone, it's not clear what is the problem we are trying to solve. Follow-up patches are supposed to provide the justification for another macro.

For Native CPU, these should be treated differently: functions defined by the platform's system libraries should be processed according to the platform's ABI, functions defined by libclc/libdevice should be processed in a target-independent way.

According to my understanding libclc/libdevice are not target independent. We build dedicated version for each target separately.

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 9, 2025

Follow-up patches are supposed to provide the justification for another macro.

What we are currently applying in our internal Native CPU testing, and what I am working on cleaning up in order to hopefully be suitable to merge:

https://raw.githubusercontent.com/uxlfoundation/oneapi-construction-kit/65e886ce0cb7c1849601637ad87ee8a0d6b7e881/scripts/testing/patches/DPCPP-0003-SYCL-NativeCPU-Limit-generic-ABI-to-builtins.patch

What this PR prepares for is

diff --git a/sycl/include/sycl/detail/defines_elementary.hpp b/sycl/include/sycl/detail/defines_elementary.hpp
index 17107c9216b3..ba3401496849 100644
--- a/sycl/include/sycl/detail/defines_elementary.hpp
+++ b/sycl/include/sycl/detail/defines_elementary.hpp
@@ -25,7 +25,12 @@
 #endif // __SYCL_ALWAYS_INLINE
 
 #ifdef SYCL_EXTERNAL
+#ifdef __NativeCPU__
+#define __DPCPP_SYCL_EXTERNAL SYCL_EXTERNAL __attribute__((intel_ocl_bicc))
+#define __DPCPP_SYCL_EXTERNAL_LIBC SYCL_EXTERNAL
+#else
 #define __DPCPP_SYCL_EXTERNAL SYCL_EXTERNAL
+#endif
 #else
 #ifdef __SYCL_DEVICE_ONLY__
 #define __DPCPP_SYCL_EXTERNAL __attribute__((sycl_device))

where whether we actually use __attribute__((intel_ocl_bicc)) or switch to a Native CPU-specific new target attribute is still to be decided and out of scope for this PR.

According to my understanding libclc/libdevice are not target independent. We build dedicated version for each target separately.

Apologies, I should have phrased that differently. I'm aware we build libclc/libdevice for each target separately, but we don't build for each aux (host) target separately, nor should we. We just build libclc once for Native CPU, and that version can be used with any host target. That is, the same libclc files are used for clang++ --target=x86_64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for x86_64, as for clang++ --target=aarch64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for AArch64.

In order for this to work, we either need to default to the platform-independent ABI (calling into libclc) for translating to LLVM IR and explicitly mark those functions where we do need the host ABI (calling into libc), or we default to the host ABI and explicitly mark those functions where we need the platform-independent ABI. Either would work, and both approaches require this PR or something equivalent.

@bader
Copy link
Contributor

bader commented Jul 9, 2025

whether we actually use __attribute__((intel_ocl_bicc)) or switch to a Native CPU-specific new target attribute is still to be decided and out of scope for this PR.

intel_ocl_bicc was created for one very specific use case, which is not relevant to SYCL. Modifying SYCL headers to use calling convention for "Intel OpenCL CPU builtins" doesn't sound right me and goes against the purpose of Native CPU as I understand it. This calling convention works only for x86 processors, moreover it's supposed to be used only for SVML functions implementing OpenCL built-in functions.

Apologies, I should have phrased that differently. I'm aware we build libclc/libdevice for each target separately, but we don't build for each aux (host) target separately, nor should we.

I'm lost here. libclc is written in OpenCL. OpenCL compiler doesn't have the concept of "aux-target".
libdevice is written in SYCL, but AFAIK it's so just to simplify the integration with the SYCL code. It's device-only library with zero host code.

We just build libclc once for Native CPU, and that version can be used with any host target. That is, the same libclc files are used for clang++ --target=x86_64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for x86_64, as for clang++ --target=aarch64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for AArch64.

When you say "build", what exactly are we producing? My assumption it's LLVM IR.
To be honest, I don't understand how Native CPU is supposed to work as LLVM IR is target dependent by design. For instance, what is the size of the pointer for Native CPU (note: some libclc functions have pointer parameters)? In your example all targets are using 64-bit pointer size, but I don't think you can re-use "same libclc files" for CPUs with 32-bit pointers.

I would prefer this patch to be reviewed by someone who understands Native CPU architecture and this issue. It will take me some time to dig deep into this issue and I don't want @hvdijk to waste his time on dealing with all my questions.

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 10, 2025

whether we actually use __attribute__((intel_ocl_bicc)) or switch to a Native CPU-specific new target attribute is still to be decided and out of scope for this PR.

intel_ocl_bicc was created for one very specific use case, which is not relevant to SYCL.

Agreed, it is a bit of a misuse of the calling convention, that's why I'm looking at changing that before submitting that as a PR. We need some calling convention that we give special treatment, and in our local patches we just went with intel_ocl_bicc because it was available and was convenient to get things working, it meant we could avoid adding a new keyword. In what we will end up submitting as a PR, we may want to do that differently.

Modifying SYCL headers to use calling convention for "Intel OpenCL CPU builtins" doesn't sound right me and goes against the purpose of Native CPU as I understand it. This calling convention works only for x86 processors, moreover it's supposed to be used only for SVML functions implementing OpenCL built-in functions.

It's a calling convention that in that patch is handled entirely by the Native CPU target, not by the x86 target: we bypass the host target's regular handling for it. This really does work with Native CPU on x86_64, on AArch64, and on RISC-V. Even if in what we end up submitting as a PR, we end up using a different calling convention, the approach will be the same. In that patch, that is handled by the change to clang/lib/CodeGen/Targets/NativeCPU.cpp.

Apologies, I should have phrased that differently. I'm aware we build libclc/libdevice for each target separately, but we don't build for each aux (host) target separately, nor should we.

I'm lost here. libclc is written in OpenCL. OpenCL compiler doesn't have the concept of "aux-target". libdevice is written in SYCL, but AFAIK it's so just to simplify the integration with the SYCL code. It's device-only library with zero host code.

The idea of Native CPU is that SYCL device code runs directly on the host CPU, so any device code is host code.

We just build libclc once for Native CPU, and that version can be used with any host target. That is, the same libclc files are used for clang++ --target=x86_64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for x86_64, as for clang++ --target=aarch64-linux-gnu -fsycl -fsycl-targets=native_cpu generating native code for AArch64.

When you say "build", what exactly are we producing? My assumption it's LLVM IR.

For the initial device compilation, we indeed produce LLVM IR. Then, at link time, when we embed that LLVM IR in the host executable, we translate that device LLVM IR using the host backend to produce machine code for whatever target is selected.

To be honest, I don't understand how Native CPU is supposed to work as LLVM IR is target dependent by design. For instance, what is the size of the pointer for Native CPU (note: some libclc functions have pointer parameters)?

Native CPU does not have things like pointer size, it copies all those attributes from the host. It cannot generally be used standalone. There are a few other targets in LLVM upstream already that similarly copy a lot of attributes from the host, that handling was taken from the NVPTX and SPIR targets.

In your example all targets are using 64-bit pointer size, but I don't think you can re-use "same libclc files" for CPUs with 32-bit pointers.

In general, you're right, Native CPU LLVM IR is not host-target-independent, nor is it meant to be. In libclc, however, we avoid anything that results in LLVM IR that varies between host targets and we have nothing that depends on the size of pointers, and those LLVM IR files are meant to be host-target-independent.

As far as I am aware, DPC++ does not support SYCL on CPUs with 32-bit pointers. In a local build, months ago, I tried it and used the compiler to build SYCL-CTS (for spir32). It did not go well. There is also no CI to test this, and no documentation that suggests this should work. If that ever changes, though, it should also work for Native CPU as well. I can double-check later this week that we have no libclc/libdevice bugs in this area just to be sure (separate from this PR).

I would prefer this patch to be reviewed by someone who understands Native CPU architecture and this issue. It will take me some time to dig deep into this issue and I don't want @hvdijk to waste his time on dealing with all my questions.

I'm happy to wait for someone else to review but they're good questions to ask to get answers written out for later reference.

@hvdijk
Copy link
Contributor Author

hvdijk commented Jul 10, 2025

I can double-check later this week that we have no libclc/libdevice bugs in this area just to be sure (separate from this PR).

I might as well update here anyway: we don't have problems with pointers directly, we do have problems with assuming size_t is 64 bits. But that and pointer size (which are going to be the same) have become a general assumption across DPC++ nowadays: trying a 32-bit build I get errors because of that in unified-memory-framework, in libdevice, in libsycl, in unified-runtime, in sycl-jit. If there is any desire to fix this, whatever gets done to fix this for other parts of libdevice will also be possible to do for Native CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants